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Metabolism is built on a foundation of organic chemistry, and employs structures and interac- 
tions at many scales. Despite these sources of complexity, metabolism also displays striking and 
robust regularities in the forms of modularity and hierarchy, which may be described compactly in 
terms of relatively few principles of composition. These regularities render metabolic architecture 
comprehensible as a system, and also suggests the order in which layers of that system came into 
existence. In addition metabolism also serves as a foundational layer in other hierarchies, up to at 
least the levels of cellular integration including bioenergetics and molecular replication, and trophic 
ecology. The recapitulation of patterns first seen in metabolism, in these higher levels, motivates 
us to interpret metabolism as a source of causation or constraint on many forms of organization in 
the biosphere. Many of the forms of modularity and hierarchy exhibited by metabolism are readily 
interpreted as stages in the emergence of catalytic control by living systems over organic chemistry, 
sometimes recapitulating or incorporating geochemical mechanisms. 

We identify as modules, either subsets of chemicals and reactions, or subsets of functions, that are 
re-used in many contexts with a conserved internal structure. At the small molecule substrate level, 
module boundaries are often associated with the most complex reaction mechanisms, catalyzed by 
highly conserved enzymes. Cofactors form a biosynthetically and functionally distinctive control 
layer over the small-molecule substrate. The most complex members among the cofactors are often 
associated with the reactions at module boundaries in the substrate networks, while simpler cofac- 
tors participate in widely generalized reactions. The highly tuned chemical structures of cofactors 
(sometimes exploiting distinctive properties of the elements of the periodic table) thereby act as 
"keys" that incorporate classes of organic reactions within biochemistry. 

Module boundaries provide the interfaces where change is concentrated, when we catalogue extant 
diversity of metabolic phenotypes. The same modules that organize the compositional diversity 
of metabolism are argued, with many explicit examples, to have governed long-term evolution. 
Early evolution of core metabolism, and especially of carbon-fixation, appears to have required 
very few innovations, and to have used few rules of composition of conserved modules, to produce 
adaptations to simple chemical or energetic differences of environment without diverse solutions 
and without historical contingency. We demonstrate these features of metabolism at each of several 
levels of hierarchy, beginning with the small-molecule metabolic substrate and network architecture, 
continuing with cofactors and key conserved reactions, and culminating in the aggregation of multiple 
diverse physical and biochemical processes in cells. 



I. INTRODUCTION 



The chemistry of life is distinguished by being both 
highly ordered and far from thermodynamic equilib- 
rium pp. 1 This dynamical order is maintained by the 
non-equilibrium transfer of electrons through the bio- 
sphere. Free energy from potential differences between 
electron donors and acceptors can be derived from a vari- 
ety of biogeochemical cycles [3] , but within cells electron 
transfer is mediated by a small number of universal elec- 
tron carriers which drive a limited array of organic reac- 
tions [5]. Together these reactions make up metabolism, 
which governs the chemical dynamics both within organ- 



Applying the often-invoked term "far from equilibrium" to bio- 
chemistry requires care. When catalysts (including transporters 
or other molecular machinery) create a separation of timescales 
between supported reactions and autonomous parasitic reactions, 
the supported reactions can sometimes be treated as an isolated 
subsystem with equilibrium approximations [2] [3] , though the 
isolation itself is a cumulative deviation far from equilibrium. 



isms and across ecosystems. The universal and appar- 
ently conserved metabolic network transcends all known 
species diversification and evolutionary change (HI H] , and 
distinguishes the biosphere within the major classes of 
planetary processes. 2 We identify metabolism with the 
quite specific substrate architecture and hierarchical con- 
trol flow of this network, which provide the most essential 
characterization of the chemical nature of the living state. 

Understanding the structure of metabolism is central 
to understanding how physics and chemistry constrain 
life and evolution. The polymerization of monomers 
into selected functional macromolecules, and the even 
more complex integration and replication of complete 
cells, form a well-recognized hierarchy of coordination 
and information-carrying processes. However, in the se- 
quence of biosynthesis these processes come late, and 
they involve a much smaller and simpler set of chemical 



2 Ref. [8] first proposed classifying the biosphere as the fourth "geo- 
sphere" , parallel to the lithosphere, hydrosphere, and atmosphere 
that have provided a classical taxonomy in geology. 
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reactions than core metabolism, the network in which all 
basic monomer components of biomass are created from 
environmental inputs. Because the core is the origin of all 
biomass, its flux is perforce higher than that in any sec- 
ondary process; only membrane electron transport (re- 
viewed in Ref. [4]) has higher energy flux. 3 Metabolism 
is the sub-space of organic chemistry over which life has 
gained catalytic control. Because in the construction and 
optimization of biological phenotypes all matter flows 
through this sub-space, its internal structure imposes a 
strong filter on evolution. 

In this review we identify a number of organizing prin- 
ciples behind the major universal structures and func- 
tions of metabolism. They provide a simple character- 
ization of metabolic architecture, particularly in rela- 
tion to microbial metabolism, ecology, and phylogeny, 
and the major (biogeochemical) transitions in evolution. 
We often find the same patterns of organization reca- 
pitulated at multiple scales of time, size, or complex- 
ity, and can trace these to specific underlying chemistry, 
network topology, or robustness mechanisms. Acting as 
constraints and sources of adaptive variation, they have 
governed the evolution of metabolism since the earliest 
cells, and some of them may have governed its emergence. 
They allow us to make plausible reconstructions of the 
history of metabolic innovations and also to explain cer- 
tain strong evolutionary convergences and the long-term 
persistence of the core components of metabolic architec- 
ture. 

Many structural motifs in both the substrate and con- 
trol levels of metabolism may be interpreted as functional 
modules. By isolating effects of perturbation and error, 
modularity can both facilitate emergence, and support 
robust function, of hierarchical complex systems [101111) . 
It may also affect the large-scale structure of evolution 
by favoring variation in the regulation and linkage be- 
tween modules, while conserving and thereby minimiz- 
ing disruption of their internal architecture and stabil- 
ity [12 [T3J . This can enhance evolvability through two 
separate effects. An increased phenotypic (i.e. structural 
or functional, as opposed to genotypic or sequence) ro- 
bustness of individual modules gives access to larger ge- 
netic neutral spaces and thus a greater number of novel 
phenotypes at the boundaries of these spaces [14]. At 
the same time, concentrating change at module inter- 
faces, and allowing combinatorial variation at the mod- 
ule level, can decrease the amount of genetic variation 
needed to generate heritable changes in aggregate phe- 
notypes [TH US] . It has been argued that asymmetries 
in evolutionary constraints can be amplified through di- 
rect selection for evolvability, and that this is a central 
source of modularity and hierarchy within biological sys- 



tems QMEE]- 

These functional consequences of modularity lead us to 
expect that intermediary metabolism will be modular as 
a reflection of the requirements of emergence and inter- 
nal stability. Certainly we observe this empirically; many 
topological analyses of metabolic networks find a mod- 
ular and hierarchical structure [T5H2"T] . We also expect 
that, with the numerous and diverse constraints from 
chemistry and physics in core metabolism, and their large 
impact on metabolic flux, the evolutionary consequences 
of chemical modularity will be greatest from the core, 
and will diminish as chemical mechanisms are simplified, 
and the impact on flux is reduced, in more peripheral 
stages of biosynthesis. 

To understand the origin and evolutionary conse- 
quences of modularity in metabolism, however, we will 
need system-level representations that go beyond topol- 
ogy, to include sometimes quite particular distinctions of 
function. Details of substrate chemistry, enzyme group- 
ing and conservation, and phylogenies of metabolic mod- 
ules, in particular, are rich sources of functional infor- 
mation and context. These will enable us to reconstruct 
steps in metabolic evolution and identify their environ- 
mental drivers. 



A. Hierarchy in metabolism, and the role of 
individuals and ecosystems 

While most metabolic conversions are performed 
within cells, 4 the structure of metabolism spans many 
levels of biological organization. The causes and roles 
of evolutionary changes, even though they arise within 
cellular lineages, may be only partly explained by orga- 
nization at the cellular or species level. Other levels that 
must also be considered include the meta-metabolome 
of trophic ecosystems [2"2Tf2"4"] , and the links to geochem- 
istry [2"5"rf3"2"] . The great biochemical cycles - of carbon, 
nitrogen, phosphorus, or many metals - combine phys- 
iological, ecological, and even geochemical links such as 
mantel convection or continental weathering. The deep- 
est universal features of metabolism are reliably seen at 
the ecosystem level [33] but not necessarily within or- 
ganisms [34] . 

These observations could be summarized as showing 
that individuality is a derived characteristic of living sys- 
tems within a larger framework of metabolic regularity, a 
perspective that fits well with the modern understanding 
that individuality takes many forms which must be ex- 
plained within their contexts [35 . Alternatively, in more 
conventional genetic descriptions of evolution [351 [37] . 
metabolic completeness, trophic as well as physiological 
flux balance, and network-level response to fluctuations 



3 Ref. [9| notes that, over a broad sample of enzymes collected 
from the literature, those for secondary metabolic reactions have 
rates ~ 1/30 the typical rates of enzymes for core reactions. 



4 Exceptions include siderophores and secreted enzymes, most of- 
ten used at the cell-population level. 
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are explicit features contributing to an organism's fitness 
within a co-evolving or constructed environment |38j . 

We can to a considerable extent disentangle the inher- 
ent chemical hierarchy of metabolism from the evolution- 
ary hierarchy of species by studying variations in the an- 
abolic (biosynthetic) versus catabolic (degradative) path- 
ways within organisms, along with the relations of au- 
totrophy (self-feeding) versus heterotrophy (feeding from 
others) in the ecological roles of species. We can argue 
for the existence of a universal anabolic, autotrophic net- 
work [39l HO] that comprises the chemistry essential to 
life. We can then separate the structural requirements 
and evolutionary history of the universal network from 
secondary complexities, which we will argue originate in 
the diversification of species and the concurrent processes 
of assembly of ecological communities. 

Within the universal (and apparently essential) net- 
work we may identify further layers, with distinct func- 
tions and plausibly distinct origins. A functioning 
metabolism is both a network of fluxes through sub- 
strate molecules, and a set of hierarchical relations in 
which some of the more complex structures control the 
kinetics of flows within the network. Within the sub- 
strate network, distinguishable subnetworks include the 
core network to synthesize CHO backbones, networks ra- 
diating from the core that incorporate N, S, P, or metals, 
higher-order networks that assemble complex organics 
from "building blocks" , and still others that synthesize all 
forms of polymers from small organic monomers. Within 
the control hierarchy, the layers of cofactors, oligomer 
catalysts, and integrated cellular energetic and biosyn- 
thetic subsystems are qualitatively distinct. 

The foundation of autotrophy - and more generally the 
anchor that embeds the biosphere within geochemistry - 
is carbon-fixation, the transformation of CO2 into small 
organic molecules (see Fig. [IJ . A recent study [41] com- 
bining evidence from phylogeny and metabolic network 
reconstruction 5 showed that all carbon fixation pheno- 
types may be related by an evolutionary tree with very 
high (nearly perfect) parsimony, and a novel but sen- 
sible phenotype at the root. The branches represent- 
ing innovations in carbon fixation were found to trace 
the standard deep divergences of bacteria and archaea. 
More striking, likely environmental drivers could be iden- 
tified for most divergences, suggesting that deep evolu- 
tion reflects first incursions into novel geochemical envi- 
ronments. The tight coupling of the reconstructed phy- 
logeny to geochemical variety suggests that constraints 
from chemistry and energetics drove early evolution in 
predictable ways, leaving little need to invoke historical 
contingency. 



5 We refer to this approach as "phylometabolic" reconstruction. 




FIG. 1: The metabolic structure of the biosphere. The bio- 
sphere as a whole can be described as implementing a global 
biological carbon cycle based on CO2, with carbon-fixation as 
the metabolic foundation. The small organic molecules pro- 
duced during fixation of CO2 are subsequently transformed 
and built up into the full diversity of known biomolecules 
through the process of anabolism, before ultimately being 
broken back down through catabolism and re-released as CO2 
through respiration. The striking modularity of metabolism 
is expressed in the fact that the interface between carbon- 
fixation and anabolism consists of a very small number of 
small organic molecules (shown schematically at right-center). 
The key observation that in addition to intermediates in the 
citric acid cycle - from which nearly all anabolic pathways em- 
anate [7] (see Fig. [6| - glycine (red) should be included in this 
set allows a complete reconstruction of the evolutionary his- 
tory of carbon-fixation |41| (see also Figures [10] and [ll]). Ab- 
breviations: Acetyl-CoA (AC A); Pyruvate (PYR); Oxaloac- 
etate (OXA); a-Ketoglutarate (AKG); Glycine (GLY). 



B. Catalytic control and origins of modularity in 
metabolism 

While carbon-fixation draws on all levels of biolog- 
ical organization (requiring integration and control of 
many cellular components), evolution in the network of 
its small-molecule substrate has consisted only of changes 
in the use of a small number of clearly defined reaction 
sequences. The disruption, disconnection, or reversal of 
these modules accounts for the full diversity of mod- 
ern carbon-fixation. As we will show below, the mod- 
ule structure is further defined by a distinction between 
two types of chemistry. Within modules, the reactions 
are mainly (de-)hydration or (de-)hydrogenation reac- 
tions, catalyzed by enzymes from common and highly- 
diversified families. Module interfaces are created (and 
distinguished) by key carboxylation reactions, catalyzed 
by highly conserved enzymes, often involving special 
metal centers and/or complex organic cofactors. The 
congruence of phylogenetic branching with topological 
and chemical module boundaries suggests that a very 
small number of catalytic innovations were the key bot- 
tlenecks to evolutionary diversification, against a back- 
ground of facile and readily re-used organic chemistry. 

Topological modularity in the small-molecule substrate 
network is often associated with functional divisions in 
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the more complex molecules that control metabolism, 
particularly the cofactors, showing that their metabolic 
role is also an evolutionary role. As carriers of electrons 
or essential functional groups, cofactors regulate kinetic 
bottlenecks in metabolic networks. The appearance and 
diversification of families of biosynthetically related co- 
factors introduced functions which served as "keys" to 
domains in organic chemistry, incorporating these within 
biochemistry. Often we may map biosynthetic pathway 
diversification of cofactors onto particular lineage diver- 
gences in the tree of life. Cofactor biosynthetic networks 
arc themselves modular, with multiple biosynthetic path- 
ways in a family using closely related enzymes that enable 
structures characteristic of the cofactor class. 

The quite sharply defined roles of many modules en- 
able us to understand strong evolutionary convergences 
that have occurred within fundamental biochemistry, and 
in some cases we can relate the functioning of an en- 
tire class of substrate or control molecules to specific 
chemical properties of elements or small chemical groups. 
Several important module boundaries are aligned at the 
same points in their substrate networks and their control 
layers. This suggests to us that lower-level substrate- 
reaction networks introduced constraints on the accessi- 
ble or robust forms of catalysis and aggregation that it 
was later possible to build up over them. From repeated 
motifs within the substructure of modules, and from pat- 
terns of re-use or convergence, we may identify chemical 
constraints on major transitions in metabolic evolution, 
and we may separate the early functions of promiscu- 
ous catalysts as enablers of chemistry, from later restric- 
tions of reactants as catalysts were made more specific. 
The remarkable fact that such low-level chemical distinc- 
tions (in elements, reactions, or small-molecule networks) 
should have created constraints on innovation well into 
the Darwinian era of modern cells suggests these as rel- 
evant constraints also in the pre-cellular era. 



C. Manuscript outline 

Our main message is twofold: 1) that the structure of 
biosynthetic networks and their observed variation, even 
though the networks are elaborate, has a compact repre- 
sentation in terms of a small collection of rules for com- 
position, and 2) that the same rules we abstract from 
composition have a natural interpretation as constraints 
on evolutionary dynamics, which as a generating process 
has produced the observed variants. We intend the ex- 
pression "logic of metabolism" to refer to the collection of 
architectural motifs and functions that have apparently 
been necessary for persistence of the biosphere, that have 
led to modularity in the physics and chemistry of life, and 
that have determined its major evolutionary contingen- 
cies and convergences. 

After a short description of the important global fea- 
tures of metabolism in Sec. [TTJ we will construct these 
at ascending levels in the hierarchy, beginning in Sec. Ill 



with the networks of core carbon fixation and the low- 
est levels of intermediary metabolism. We will then, in 
Sec. |IV| consider cofactors as the intermediate level of 
structure and the first level of explicit control in bio- 
chemistry, illustrating how key cofactor classes govern 
the fixation and transfer of elementary carbon units, and 
introduce control over reductants and redox state. Both 
in the metabolic substrate and in the cofactor domain, it 
will be possible to suggest a specific historical order for 
many major innovations. For the substrate network this 
will capture conditional dependencies in the innovation 
of carbon fixation strategies. For cofactors it will allow 
us to approximately place the emergence of specific co- 
factor functionalities within the expansion of metabolic 
networks from inorganic inputs. 

In Sec. [V] we consider the processes by which innova- 
tion occurs, specifically interplay of the introduction of 
general reaction mechanisms versus selectivity over sub- 
strates. The modular substructure and evolutionary se- 
quence of many of our reconstructed innovations favors 
an early role for non-specific catalysts, with substrate 
selectivity appearing later. Finally in Sec. |VI| we list 
candidates for the major organizing constraints on inte- 
gration of metabolism within cells. These include the 
role of compartments in linking energy systems, as well 
as the coupling of physiological and genetic individuality, 
which permit species differentiation, and complementary 
specialization within ecological assemblies. 



II. AN OVERVIEW OF THE ARCHITECTURE 
OF METABOLISM 

A. Anabolism and catabolism in individuals and 

ecosystems 

Metabolic networks within organisms are commonly 
characterized [4"2"1H5] as having three classes of pathways: 
1) catabolic pathways that break down organic food to 
provide chemical "building blocks" or energy; 2) core 
pathways through which nearly all small metabolites pass 
during primary synthesis or ultimate breakdown, and 3) 
anabolic pathways that build up all complex chemicals 
from components originating in the core. This qual- 
itative characterization (which may be complicated by 
salvage pathways and other cross-linkages) is supported 
by a strong statistical observation that most minimal 
pathways connecting pairs of metabolites consist of a 
catabolic and an anabolic segment connected through the 
core [H] . Thus, relatively speaking, the catabolic and an- 
abolic pathways are less densely crosslinked than path- 
ways within the core, from which they radiate. Catabolic 
pathways in a cell may be fed through physiological or 
trophic links to other cells or organisms, or they may 
break down food produced previously by the same cell 
and then stored. Fig. [2] illustrates schematically the re- 
lation of the three classes. Both catabolic and anabolic 
pathways may be large and somewhat diversified; the 
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core itself constitutes no more than a few hundred small 
metabolites [39l [40] , most of which have functions that 
are universal throughout the biosphere. 



Net Biosphere Trophic ecosystem 




Autotrophic Individual Heterotrophic Individual 




FIG. 2: Global structure of metabolism. Anabolic pathways 
(blue) build biomass and catabolic pathways (red), which may 
be their direct reverses, break it down. Carbon enters the bio- 
sphere through the core (black), which is the starting point of 
anabolism and also the endpoint of respiration. Because the 
biosphere as a whole is autotrophic, anabolism is functionally 
prior to catabolism. Both single organisms and assemblies 
of autotrophs can possess metabolic charts consisting only 
of anabolic pathways that fan outward from the core (blue 
and green). By partitioning pathway directions between an- 
abolic and catabolic (joined at the core), organisms can take 
on the familiar "bowtie" architecture of derived metabolism 
(red with blue). Their assembly into trophic ecosystems (blue 
and red radial graph) then both builds and degrades organic 
compounds actively, cycling carbon between environmental 
CO2 and biomass (green). In these graphs, concentric (green) 
shells reflect sequential steps in biosynthesis leading to a hi- 
erarchy of increasing molecular complexity. 



Whole-organism metabolisms are conventionally di- 
vided into two classes - autotrophs and hctcrotrophs - ac- 
cording to the ways they combine anabolic and catabolic 
pathways [5] . Autotrophs synthesize all required metabo- 
lites from inorganic precursors, and can function without 
catabolism, using only the core and anabolic pathways 
radiating from it. 6 Heterotrophs, in contrast, are organ- 
isms that must obtain organic inputs from their envi- 
ronments because they lack essential biosynthetic path- 
ways. As a result of this difference, the two classes of 
metabolism have fundamentally different ecological roles. 



Establishing this completeness can prove challenging, how- 
ever I45| . 



Autotrophs form the lowest trophic level in the bio- 
sphere, fixing CO2 into organic matter, while het- 
erotrophs form all subsequent levels, determining the 
structure of flows of organic compounds in trophic 
webs |46j , and actively cycling carbon from biomass back 
to environmental CO2. While all biological free energy 
passes at some stage through redox couples, autotrophs 
capture a part of this energy by transferring electrons 
from high energy reductants to CO2 [?]. Heterotrophs 
may exploit incomplete use of this free energy through 
internal redox reactions (fermentation), or they may re- 
oxidize organic matter back to CO2 (respiration). 

The role of catabolism in most organisms is closely tied 
to their ecological role as heterotrophs. Heterotrophy 
provides enormous opportunity for metabolic diversifi- 
cation [34 , in the evolution of catabolic pathways and 
the partitioning of essential anabolic reactions among 
the constituent species within ecosystems. However, 
the study of metabolism restricted to particular het- 
erotrophic organisms 7 can obscure much of its universal- 
ity: heterotrophs may differ widely, but the aggregate an- 
abolic networks that sustain them at the level of ecosys- 
tems are largely invariant. Autotrophs show that much 
of this diversity is not essential to life, allowing us to 
conceptually separate the requirements for biosynthesis 
from complexities that originate in processes of individ- 
ual specialization and ecological assembly [4"7] . 

The motif of three-stage pathways - catabolic, core, 
anabolic - between typical pairs of metabolites, a mo- 
tif obtained through the study of heterotrophs, has been 
abstracted into a paradigm of "bowtie" architecture for 
metabolism (22-44 . 8 However, in combining universal 
elements of metabolic dependency with widely variable 
physiological or ecological specializations, the "bowtie" 
can be misleading. The core and anabolism are essen- 
tial (and we argue more ancestral), and the reduction 
in cross-linking with distance from the core may be seen 
to reflect an entirely outgoing radial "fan" of anabolism. 
Biomass is organized in a sequence of concentric shells 
spanned by the radial pathways, which count the num- 
ber and complexity of biosynthetic steps |40j . Organisms 
exist which can function without catabolism, but only the 
most derived parasites lack anabolism. Moreover, many 
of the common catabolic pathways are (approximate or 



7 Almost all model organisms have been heterotrophs, because 
these are accessible and are usually connected to humans as sym- 
bionts, pathogens, or cultivars. E. coli (in which operons were 
discovered) is a phenotypically and trophically very plastic or- 
ganism due to its complex lifccyclc. All multicellular organisms 
are heterotrophs, including plants, since these fix carbon but rely 
on soil symbionts to fix nitrogen. The only known autotrophic or- 
ganisms are bacteria and archaea, and no autotroph is currently 
well-developed as a model organism. 

8 The paradigm of the metabolic bowtie is also in part a borrow- 
ing from a conventional paradigm in engineering |42| , motivated 
by applications to human physiology and medicine (John Doyle, 
pers. comm.). 



() 



exact) reversals of widespread anabolic pathways, 9 and 
are explained as consequences of ecological change. Finer 
diversifications arise as adaptations to specific ecological 
or geochemical environments. Therefore, in this review 
we will emphasize core pathways and a subset of anabolic 
pathways, as they contribute to the universal aspects of 
autotrophic networks. 

The conceptual difference and asymmetry between au- 
totrophy and heterotrophy becomes clearer when we ex- 
amine the metabolic structure of ecosystems at increas- 
ing scales of aggregation. Entire ecosystems, to the ex- 
tent that they are approximately closed, function chem- 
ically as autotrophs. The biosphere as a whole is not 
only approximately, but fully autotrophic, as today it 
does not depend significantly on extraterrestrially, atmo- 
spherically, or geologically produced organics. This ob- 
servation still admits two possibilities for the emergence 
of aggregate metabolism: Either the biosphere has been 
autotrophic since its inception, or it was originally het- 
erotrophic and later switched to using CO2 as its sole 
carbon source. 

The Oparin-Haldane conjecture [313 EH] has motivated 
some consideration of a catabolic origin of life, but we can 
find no close empirical contact of this conjecture with fea- 
tures of extant life. Therefore we will assume the primacy 
of the core and anabolic pathways, and will consider the 
problem of emergence and early evolution of fully au- 
totrophic systems. As long as we do not conflate the 
chemical condition of autotrophy (complete anabolism) 
with assumptions about individuality (whether complete 
anabolisms are contained within the regulatory control of 
individual organisms) [47] , and as long as we recognize 
the ecosystem as potentially the correct level of aggrega- 
tion to define autotrophy, we need not assume that the 
first life consisted of autotrophic individual cells. Our in- 
terpretation extends equally to populations of organisms 
that were physiologically as well as genetically incomplete 
and functioned cooperatively [5"Trf54] , 

Once organism-level and species-level organization has 
been put aside as a separate question, the chemical 
distinction between heterotrophy and autotrophy is a 
distinction between metabolic partial-systems with un- 
known and highly variable boundary conditions, versus 
whole-systems required to subsist on CO2 and reductant. 
If we wish to understand the structure of the biosphere 
and to interpret the sequence of innovations in core car- 
bon fixation, the added constraints of autotrophy provide 
a framework to do this. Finally, identifying the chemical 
nature of life with autotrophic metabolism, rather than 
modeling it on species heterotrophy as in the Oparin- 
Haldane conjecture, is compatible with a hypothesis of 
continuity in the emergence of life [37J . Rather than re- 
inventing metabolism as a palimpsest over earlier abi- 



An example is glycolysis, which is the reverse of gluconeogene- 
sis g8]. 



otic sources [55J, we suppose that organic life subsumed 
and controlled organosynthetic process of geochemical 
origin [271155]. 



B. Network topology, self-amplification, and levels 
of structure 

Understanding either the emergence of life, or the ro- 
bust persistence of the biosphere, requires understand- 
ing life's capacity for exponential growth. Exponen- 
tial growth results from proportional self-amplification 
of metabolic and other networks that have an "autocat- 
alytic" topology [57rf6T] (see Fig. |3| . Network autocatal- 
ysis is a term used to describe a topological (stoichio- 
metric) property of the substrate network of chemical 
reactions. In a catalytic network, one or more of the 
network intermediates is needed as a substrate to en- 
able the pathway to connect to its inputs or to convert 
them to outputs, but the catalytic species is regenerated 
by the stage at which the pathway completes. Network- 
catalytic pathways must therefore incorporate feedback 
and comprise one or more loops with regard to the in- 
ternally produced molecules. An autocatalytic network is 
a catalytic network augmented by further reactions that 
convert outputs to additional copies of the network cat- 
alyst, rendering the pathway self-amplifying. 10 

Network autocatalysis is necessary to maintain dy- 
namical ordered states, by re-concentrating inputs into 
a finite number of intermediates, against the disorder- 
ing effects of thermodynamic decay and continual ex- 
ternal perturbation. Therefore all observed persistent 
material flows in the biosphere can only be products of 
autocatalytic networks, though they may require hard- 
to-recognize feedbacks ranging from the level of cell 
metabolism to trophic ecology for full regeneration. This 
ex post observation does not, however, explain why self- 
amplification was possible in abiotic chemistry to give 
rise to a biosphere. In addition to topologies enabling 
feedback, the latter would have required that interme- 
diates in the network be produced at rates higher than 
those at which they were removed. 

The significant observations about autocatalysis in the 
extant biosphere, which may also contain information 



Molecular autocatalysis - the property that intermediates in a 
pathway serve as conventional molecular catalysts for other re- 
actions in the pathway - may be understood as a restricted form 
of network autocatalysis in which the reaction to which some 
species is an essential input is the same reaction that regenerates 
that species. Some chemists prefer to use the term "network au- 
toamplification" for the general case, restricting "autocatalysis" 
to apply only when species are traditionally-defined molecular 
catalysts. We will use "autocatalysis" for the general case, to 
reflect the property of stoichiometry that a pathway regenerates 
essential inputs. For us the distinction between autocatalysis 
at the single molecule versus more general network level mainly 
effects the kinetics and regulation of pathways. 
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FIG. 3: Upper limit growth rate curves for chemical re- 
action networks of different classes. To highlight the role 
of network topologies, the chemistry is simplified with only 
C-C bond forming and cleaving reactions shown. Each of 
the growth curves qualitatively represents the upper limits 
for mass accumulation within the participating substrates of 
the pathways. When fully integrated within modern cellu- 
lar biochemistry, both linear and cyclic pathways are network 
autocatalytic and capable of exponential growth. This form 
of network autocatalysis, however, derives from feedback pro- 
vided by cofactors and enzymes, both of which have elaborate 
synthesis pathways, and is thus classified as "long-loop" au- 
tocatalysis. In an abiotic world in which reaction-level catal- 
ysis is limited to external sources, only cyclic pathways with 
feedback topologies at the substrate level - correspondingly 
classified as "short-loop" autocatalysis - are capable of ex- 
ponential growth, while linear pathways are limited to linear 
growth. We contend that an early presence of short-loop au- 
tocatalysis is important because it provides a mechanism to 
concentrate mass flux within abiotic chemical networks, pre- 
venting excessive dilution with increasing size and complexity 
of organic molecules, and in turn giving easier and more ro- 
bust access to subsequent long-loop feedback closures. 



the geochemical supports they required for stability and 
self-amplification, before those supports were attained 
through integration into cellular biochemistry. 

We wish, in these characterizations, to recognize what 
we might call "conditional" as well as strict autocatal- 
ysis. In extant organisms, where (essentially) all reac- 
tions are catalyzed by macromolecules, and most cofac- 
tors (reductants, nucleoside-triphosphates, coenzymes) 
are recharged by cellular processes, strict autocatalysis 
of any network is only satisfied in the context of the 
full complement of integrated cellular processes. If, how- 
ever, inputs provided by cofactors, macromolecules and 
energy systems in modern cells could have been provided 
externally in earlier stages of life, for instance by miner- 
als or geochemical processes, then identifying networks 
in extant biochemistry that, although simple, would be 
autocatalytic if given these supports, may give informa- 
tion about intermediate stages of emergence (see Fig. [3). 
The strong modularity of extant metabolism and its con- 
gruence with such conditionally autocatalytic topologies 
suggests that a separation into layers corresponding to 
stages of emergence may be sensible. In addition to re- 
constructing historical stages, the mechanisms leading to 
autocatalysis in different sub-systems may suggest im- 
portant geochemical contexts or sources of robustness 
still exploited in modern metabolism. 



C. Network-autocatalysis in carbon- fixation 
pathways 

At the chemically simplest level of description - that 
of the small-molecule metabolic substrates and their 
reaction-network topologies - carbon fixation pathways 
form two classes. Five of the the six known pathways 
are autocatalytic loops, while one is a linear reaction 
sequence. 11 The loop pathways condense CO2 or bicar- 
bonate onto their substrate molecules, lengthening them. 
Each condensation is followed by a reduction, making the 
average oxidation state of carbon in the pathway sub- 
strate lower than that of the input CO2, and resulting 
in a negative net free energy of formation in a reduc- 
ing environment [7]. 12 Each fixation loop contains one 
reaction where the maximal-length substrate is cleaved 
to produce two intermediates earlier in the same path- 
way, resulting in self-amplification of the pathway flux. 
As long as pathway intermediates are replenished faster 



about its emergence, concern the complexity, number, 
and particular form of levels in which autocatalytic feed- 
back can be found. Where the hierarchical modules 
of metabolic structure or function follow the bound- 
aries required for feedback closure of different autocat- 
alytic sub-networks, it may be possible to order the ap- 
pearance of those sub-networks in time, and to infer 



1 All uses of autocatalysis in this section refer to conditional au- 
tocatalysis, taking as external support the same level of cofactor 
or enzymatic complexity. Such external factors being equal, the 
small-molecule substrate pathways of the loops display an addi- 
tional form of autocatalysis not present in the linear pathway. 

2 Reducing power may originate in the geochemical environment, 
but in modern cells electrons are transferred endergonically to 
more powerful reductants such as NADH, NADPH, FADH2, or 
reduced ferredoxin. 



than they are drained by parasitic or anabolic side reac- 
tions, the loop current remains above the autocatalytic 
threshold. However, the threshold is fragile, as pathway 
kinetics provide no inherent barrier against flux's falling 
below threshold and subsequently collapsing. 13 

At the level of network topology, the linear Wood- 
Ljungdahl (WL) fixation pathway [62-61] is strikingly 
unlike the five loop pathways. Instead of covalently bind- 
ing CO2 onto pathway substrates, which then serve as 
platforms for reduction, the WL reactions directly reduce 
one-carbon (Ci) groups, and then distribute the partly- 
or fully-reduced intermediates to other anabolic path- 
ways where they are incorporated into metabolites. The 
linear sequence of reductions has no feedback, and the Ci 
groups at intermediate oxidation states do not increase in 
complexity. Instead, these reductions (leading to inter- 
mediate Ci states that would be unstable in solution) are 
carried out on evolutionarily refined folate cofactors |65j . 
The topology of the WL pathway becomes self-amplifying 
only if the larger and more complex biosynthctic network 
for these cofactors is considered together with that of 
the Ci substrate. We will characterize this distinction 
between the loop-fixation pathways and WL as a distinc- 
tion between short-loop and long-loop autocatalysis (see 
Fig. [3]). 14 Short loops contain only the small-molecule 
substrates; long loops incorporate the biosynthetic net- 
works for cofactors as well. 

The network catalysts that could be said to "select" 
the short-loop pathways are the reaction intermediates 
themselves. The key metabolites that have the corre- 
sponding selection role for WL are the folate cofactors 
produced in a secondary biosynthetic network. Short- 
loop and long-loop pathways are therefore distinguished 
both by the number of reactions that must be maintained 
and regulated, and by the fact that WL spans substrates 



and cofactors, which we will argue in Sec. IV are nat- 
urally interpreted as qualitatively distinct layers within 
biochemistry. 

The appearance of different features suggesting sim- 
plicity or primordial robustness, in different fixation 
pathways, together with aspects of their phylogenetic dis- 
tribution, have led to diverse proposals about the order of 
their emergence [35] EZ] • WL is the only carbon-fixation 
pathway found in both bacteria and archaea, and its re- 
actions have been shown to have abiotic mineral ana- 
logues [6S] [551 IM], suggesting a prebiotic origin. Yet 
WL is not self- amplifying and so lacks the capacity for 
chemical "competitive exclusion" (equivalent to the ca- 
pacity for exponential growth). The cofactors that make 



it self-amplifying are complex, and the simple pathway 
structure of Ci reduction does not suggest what would 
have supported their formation. 

In contrast, autocatalysis within the small-molecule 
substrate networks of the loop pathways suggests 
the inherent capacity for self-amplification, exponential 
growth, and chemical competitive exclusion. This is an 
appealing explanation [7J for the role, particularly of the 
intermediates in the reductive citric acid cycle [I)] [70] (dis- 
cussed in Sec.pTI) as precursors of biomass. Arcs within 



this pathway have also been reproduced experimentally 
in mineral environments [71] . though a self-amplifying 
system has not yet been demonstrated. However, self- 
amplification requires complete loops, and even the most 
compelling candidate for a primordial form (reductive cit- 
ric acid cycling) is found only in a subset of bacterial 
clades. 

We argue in the next section that a joint fixation path- 
way incorporating both WL and citric acid cycling re- 
solves many of these ambiguities in a way that no modern 
fixation pathway can. 15 As a phylogenetic root, it defines 
a template from which the fixation pathways in all mod- 
ern clades could have diverged, and as a candidate for a 
primordial metabolic network, it provides both chemical 
selection of biomass precursors by short-loop autocatal- 
ysis, and a form of protection against the fragility of the 
autocatalytic threshold. We will first describe the bio- 
chemistry and phylogenetics of carbon-fixation pathways 
in the current biosphere, and then show how their pat- 
terns of modularity and chemical redundancy provide a 
framework for historical reconstruction. 



III. CORE CARBON METABOLISM 

Currently six carbon fixation pathways are known [7U 
[75]. While they are distinct as complete pathways, 
they have significant overlaps at the level of individual 
reactions, and even greater redundancy in local-group 
chemistry. They are also, as shown in Fig. [5] (below), 
tightly integrated with the main pathways of core car- 
bon metabolism, including lipid synthesis, gluconeogen- 
esis, and pentose-phosphate synthesis. 

An extensive analysis of their chemistry under phys- 
iologically relevant conditions has shown that individ- 
ual fixation pathways contain two groups of thermo- 
dynamic bottlenecks: carboxylation reactions, and car- 
boxyl reduction reactions [71]. In isolation these reac- 
tions generally require ATP hydrolysis to proceed, and 
how pathways deal with (or avoid) these costs has been 



The autocatalytic threshold and dynamics of growth, saturation, 
or collapse are considered in Sec. D| and shown in Fig. |12| 
In the network context of long-loop, wL fixation mechanism, the 
folate cofactors have an intermediate role between network cat- 
alysts and molecular catalysts, as they are passive carriers, but 
form stable molecular intermediates rather than mere complexes 
as are formed by enzymes with their substrates. 



15 A proposal for WL fixation followed by citric-acid cycling is made 
in the text of Rcf. 69 , though the primordial networks proposed 
in that paper are forms of acetogenesis, and do not emphasize 
self-amplification and short-loop autocatalysis as essential early 
requirements. 
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concluded to form an important constraint their inter- 
nal structure [74]. We will further show how the elabo- 
rate and complex catalytic mechanisms associated with 
these reactions form essential evolutionary constraints on 
metabolism. 

We will first describe the biochemical and phylogenetic 
details of the individual pathways, and then diagram 
their patterns of redundancy, first at the level of modu- 
lar reaction sequences, and then in local-group chemistry. 
Finally we will use this decomposition together with evi- 
dence from gene distributions to propose their historical 
relation and identify constraints that could have spanned 
the Darwinian and pre-cellular eras. 



A. Carbon fixation pathways 

1. Overview of pathway chemistries, phylogeny and 
environmental context 

Wood-Ljungdahl: The Wood-Ljungdahl (WL) path- 
way [62 64] [66] consists of a sequence of five reactions 
that directly reduce one CO2 to a methyl group, a par- 
allel reaction reducing CO2 to CO, and a final reaction 
combining the methyl and CO groups with each other 
and with a molecule of Coenzyme-A (CoA) to form the 
thioester acetyl-CoA. The reactions are shown below in 
Fig. [4] and discussed in detail in Sec. IV The five steps 
reducing CO2 to — CH3 make up the core pathway of 
folate (vitamin B g ) chemistry and its archaeal analog, 
which we consider at length in Sec. |III B| The reduc- 
tion to CO, and the synthesis of acetyl-CoA, are per- 
formed by the bi- functional CO-Dehydrogenase/ Acetyl- 
CoA Synthase (CODH/ACS), a highly conserved enzyme 
complex with Ni-[Fe4Ss] and Ni-Ni-[Fc4S4] centers [75- 
175] , Methyl-transfer from pterins to the ACS active site 
is performed by a corrinoid iron-sulfur protein (CFeSP) 
in which the cobalt-tetrapyrrole cofactor cobalamin (vi- 
tamine Bi 2 ) is part of the active site [791 15U] . 

Phylogenetically, WL is a widely distributed pathway, 
found in a variety of both bacteria and archaea, in- 
cluding acetogens, methanogens, sulfate reducers, and 
possibly anaerobic ammonium oxidizers [72 . The full 
pathway is found only in strict anaerobes, because the 
CODH/ACS is one of the most oxygen-sensitive enzymes 
known [81] [82] 16 . However, as we have argued in Ref. [41], 
the folate-mediated reactions form a partly-independent 
sub-module. This module combines with the equally- 
distinctive CODH/ACS enzyme to to form the complete 
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FIG. 4: The reactions in the Wood-Llyungdahl pathway of 
direct Ci-reduction. The main sequence on pterins is shown, 
with five outputs for formyl, methylene, or methyl groups. 
The semi-independent submodule often used to directly syn- 
thesize glycine and serine from CO2, even when acetyl-CoA 
synthesis is absent, is highlighted in red. Alternative path- 
ways to glycine and serine, from 3-phosphogly cerate in gluco- 
neogenesis/glycolysis and glyoxylate, are shown in the upper 
right quadrant. Finally, the dashed arrows represent a sug- 
gested alternative form of formate uptake based on binding at 
N 5 rather than N 10 of folate before cyclization to methenyl- 
THF El. 



WL pathway, but can serve independently as partial 
carbon-fixation pathways even in the absence of the fi- 
nal step to acetyl-CoA (see Fig. In this role it is 
found almost universally among deep bacterial clades. 

All carbon fixation pathways in extant organisms em- 
ploy some essential and apparently unique enzymes and 
most also rely in essential ways on certain cofactors. 17 
Among these, however, the function provided by pterin 
cofactors 18 in WL is arguably the most complex, extend- 
ing beyond mere reduction. Pterins mediate capture of 



Recent results (S. Ragsdale, pers. comm.) suggest that the 
CODH/ACS is also sensitive to sulfides and perhaps other ox- 
idants. We will impute this oxidant sensitivity as the reason 
the CODH/ACS was lost in the ancestral environments of some 
clades in Sec. |III D 2| For later branches the oxidant may have 
been molecular oxygen, but O2 is not a plausible toxin at the 
time of the LUCA or earliest phylogenetic separations. 



17 For example, the 3-hydroxypropionate pathway relies on biotin 
for reactions shared with (or homologous to) those in fatty acid 
synthesis. The reductive citric-acid cycle relies on reduced ferre- 
doxin [S3] ■ a simple multi-domain [Fe4S4] enzyme and on thi- 
amine, in its reductive carbonyl- insertion reaction I84| . and also 
on biotin for its /3-carboxylation steps [851 186] . 

18 Pterin is a name referring to the class of cofactors including 
folates and the methanopterins, which are both derived from a 
neopterin precursor. 
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formate, reduction of carbon bound to one or two nitro- 
gen atoms, and transfer of formyl, methylene, or methyl 
groups. 19 In this sense the simple network topology of 
direct Ci reduction seems to require a more elaborate 
dependence on cofactors than is seen in other pathways. 

Reductive citric-acid cycle: The reductive citric- 
acid (reductive Tricarboxylic Acid, or rTCA) cycle [70, 
1ST] is the reverse of the oxidative Krebs cycle, which 
was by all evidence derived from it |33l [88] during 
the rise of oxygen. It is a sequence of eleven interme- 
diates and eleven reactions, highlighted in Fig. [5j which 
reduce two molecules of CO2, and combine these through 
a substrate-level phosphorylation with CoA, to form one 
molecule of acetyl-CoA. In the cycle, one molecule of 
oxaloacetate grows by condensation with two CO2 and 
is reduced and activated with CoA. The result, citryl- 
CoA, undergoes a retro-aldol cleavage to regenerate ox- 
aloacetate and acetyl-CoA. 20 A second arc of reactions 
sometimes termed anaplerotic jS] condenses two further 
CO2 with acetyl-CoA to produce a second molecule of ox- 
aloacetate, completing the network-autocatalytic topol- 
ogy and making the cycle self-amplifying. The distinctive 
reaction in the rTCA pathway is a carbonyl insertion at 
a thioester (acetyl-CoA or succinyl-CoA), performed by 
a family of conserved ferredoxin-dependent oxidoreduc- 
tases which are triple-Fe4S4-cluster proteins [SI]. The 
cycle is found in many anaerobic and microaearophylic 
bacterial lineages, including aquificales, chlorobi, and 6- 
and e-proteobacteria. 

Enzymes from reductive TCA reactions are very widely 
distributed among bacteria, where they support fermen- 
tative pathways that break cycling and use intermediates 
such as succinate as terminal electron acceptors [j|T] • The 
co-presence of variant enzymes associated with reductive 
and oxidative cycling [92H94] may provide detailed evi- 
dence about the reversal of core metabolism under the 
rise of oxygen. 

Dicarboxylate/4-hydroxybutyrate cycle: The 

dicarboxyate/4- hydroxy butyrate (DC/4HB) cycle [66l 
US], illustrated in Fig. 7] (below) is, like rTCA, a single- 
loop network-autocatalytic cycle, but has a simpler form 
of autocatalysis in which acetyl-CoA rather than ox- 
aloacetate is the network catalyst. Only two CO2 
molecules are attached in the course of the cycle to form 
acetoacetyl-CoA, which is then thioesterified at the sec- 
ond acetyl moiety and cleaved to directly regenerate two 
molecules of acetyl-CoA. An extra copy of the network 
catalyst is thus directly regenerated (with suitable CoA 



19 The distinctive role of cofactors continues with the dependence 
of the acetyl-CoA synthesis on cobalamin, a highly reduced 
tetrapyrrole capable of two-electron transfer|79|. 

20 Here we separate the formation of citryl-CoA from its subsequent 
retro-aldol cleavage, as this is argued to be the original reaction 
sequence, and the one displaying the closest homology in the 
substrate-level phosphorylation with that of succinyl-CoA |89l 



activation) without the need for anaplerotic reactions. 
The cycle has so far been found only in anaerobic crenar- 
chaeota, but within this group it is believed to be widely 
distributed phylogenetically [55] [SS]. The first five re- 
actions in the cycle (from acetyl-CoA to succinyl-CoA) 
are identical to those of rTCA. The second arc of the cy- 
cle begins with reactions found also in 4-hydroxybutyrate 
and 7-aminobutyrate fermenters in the Clostridia (a sub- 
group of Firmicutes within the bacteria) , and terminates 
in the reverse of reactions in the isoprene biosynthesis 
pathway. The DC/4HB pathway thus uses the same 
ferredoxin-dependent carbonyl-insertion reaction used in 
rTCA (though only at acetyl-CoA), along with distinc- 
tive reactions associated with 4-hydroxybutyrate fermen- 
tation. In particular, the dehydration/isomerization se- 
quence from 4-hydroxybutyryl-CoA to crotonyl-CoA is 
performed by a flavin-dependent protein containing an 
[Fe4-S4] cluster, and involves a ketyl-radical intermedi- 
ate [sun?]. 

3-hydroxypropionate bicycle: The 3- 

hydroxypropionate (3HP) bicycle [55], highlighted 
in Fig. [9] has the most complex network topology of the 
fixation pathways, using two linked cycles to regenerate 
its network catalysts and to fix carbon. The network 
catalysts in both loops are acetyl-CoA and the outlet 
for fixed carbon is pyruvate. The reactions in the 
cycle begin with the biotin-dependent carboxylation of 
Acetyl-CoA to form Malonyl-CoA, from the fatty-acid 
synthesis pathway, followed by a distinctive thioester- 
ification (98j and a second, homologous carboxylation 
of propionyl-CoA (to methylmalyl-CoA) followed by 
isomerization to form succinyl-CoA. The first cycle 
then proceeds as the oxidative TCA arc, followed by 
retro-aldol reactions also found in the glyoxylate shunt 
(described below). A second cycle is initiated by an 
aldol condensation of propionyl-CoA with glyoxylate 
from the first cycle to yield /3-methylmalyl-CoA, which 
follows a sequence of reduction and isomerization 
through an enoyl intermediate (mesaconate) similar to 
the second cycle of rTCA or the 4HB pathway. This 
complex pathway was discovered in the Chloroflexi 
and is believed to represent an adaptation to alkaline 
environments in which the C02/HCO^ (bicarbonate) 
equilibrium strongly favors bicarbonate. All carbon 
fixations proceed through activated biotin, thus avoiding 
the carbonyl insertion of the rTCA and DC/4HB 
pathways. The complexity of the bicycle network likely 
reflects the difficulty of replacing both carbonyl insertion 
reactions from an ancestral rTCA cycle while retaining 
autocatalysis, but it also suggests the diverse inventory 
of pathway segments available to draw from at the time 
of its emergence, which reflect an underlying chemical 
simplicity and redundancy, as we will show. 

3-hydroxypropionate / 4-hydroxybutyrate cy- 
cle: The 3-hydroxypropionate/4-hydroxybutyrate 
(3HP/4HB) cycle [99], shown in Fig. [7j is a single- loop 
pathway in which the first arc is the 3HP pathway, and 
the second arc is the 4HB pathway. Like DC/4HB, 
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3HP/4HB uses acetyl-CoA as network catalyst and 
fixes two CO2 to form acetoacetyl-CoA. The pathway 
is found in the Sulfolobales (crenarchaeota) , where it 
combines the crenarchaeal 4HB pattern of autotrophic 
carbon fixation with the bicarbonate adaptation of the 
3HP pathway. Like the 3HP bicycle, the 3HP/4HB 
pathway is thought to be an adaptation to alkalinity 
but because the 4HB arc does not fix additional carbon, 
this adaptation resulted in a simpler pathway structure 
than the bicycle. 

Calvin-Benson-Bassham cycle: The Calvin- 
Benson-Bassham (CBB) cycle [100LI101] is responsible for 
most of known carbon fixation in the biosphere. In the 
same way as WL adds only the distinctive CODH/ACS 
reaction to an otherwise very-widely-distributed folate 
pathway [41], the CBB cycle adds a single reaction 
to the otherwise-universal network of aldol reactions 
among sugar-phosphates that make up the gluconeogenic 
pathway to fructose 1,6-bisphosphate and the reductive 
pentose phosphate pathway to ribose and ribulose 1,5- 
bisphosphate. 21 The distinctive CBB reaction that ex- 
tends reductive pentose-phosphate synthesis to a car- 
bon fixation cycle is a carboxylation performed by the 
Ribulose 1,5-bisphosphate Carboxylase/Oxygenase (Ru- 
bisCO), together with cleavage of the original ribulose 
moiety to produce two molecules of 3-phosphoglycerate. 
The Calvin cycle resembles the 4HB pathways in regen- 
erating two copies of the network catalyst directly, not 
requiring separate anaplerotic reactions for autocalysis. 
In addition to carboxylation, RubisCO can react with 
oxygen in a process known as photorespiration [102H104"] 
to produce 2-phosphoglycolate (2PG), a precursor to gly- 
oxylate that is independent of rTCA-cycle reactions. The 
CBB cycle is widely distributed among cyanobacteria,in 
chloroplasts in plants, and in some secondary endosym- 
bionts. 

The glyoxylate shunt: Although it is not an au- 
totrophic carbon-fixation pathway, the glyoxylate shunt 
(or glyoxylate bypass) is of interest because it shares in- 
termediates and reactions with many of the above fixa- 
tion pathways, and because it resembles a fixation path- 
way in certain topological features. The pathway is 
shown below in Fig. [9j All aldol reactions that can be 
performed starting from rTCA intermediates appear in 
this pathway, either as cleavages or as condensations. In 
addition to condensation of acetate and oxaloacetate to 
form citrate, these include cleavage of isocitrate to form 



The universality of this network requires some qualification. We 
show a canonical version of the network in the figures below, 
and some variant on this network is present in every organism 
that synthesizes ribose. However, the (CH2O)™ stoichiometry 
of sugars, together with the wide diversity of possible aldol re- 
actions among sugar-phosphates, make sugar re-arrangement a 
problem in the number theory of the small integers, with solu- 
tions that may depend sensitively on allowed inputs and out- 
puts. Other pathways within the collection of attested pentose- 
phosphate networks are shown in Ref. |61l . 



glyoxylate and succinate, and condensation of glyoxylate 
and acetate to form malate. The shunt is a weakly ox- 
idative pathway (generating one H 2 -equivalent from oxi- 
dizing succinate to fumarate) , and is otherwise a network 
of internal redox reactions. It is therefore a very widely- 
used facultative pathway under conditions where carbon 
for biosynthesis, more than reductant, is limiting. 

Two of the arcs of the shunt overlap with arcs in the 
oxidative Krebs cycle, but the entire pathway is a bicy- 
cle much like the 3HP-bicycle, sharing many of the same 
intermediates, but running in the opposite direction. Ox- 
idative pathways such as the Krebs cycle are ordinarily 
catabolic, and hence not self-maintaining. The glyoxy- 
late shunt may be regarded as a network-autocatalytic 
pathway for intake of acetate, using malate as the net- 
work catalyst and regenerating a second molecule of 
malate from two acetate molecules. This may be part 
of the reason that the shunt is up-regulated in the 
deinococcus-thermus family of bacteria in response to ra- 
diation exposure |105j . providing additional robustness 
from network topology under conditions when metabolic 
control is compromised. 



2. Thermodynamic constraints on pathway structure 

The central energetic costs of carbon-fixation pathways 
are associated with carboxylation reactions in which CO2 
molecules are added to the growing substrate, and the 
subsequent reactions in which the carboxyl group is re- 
duced to a carbonyl [71] • In isolation these reactions 
require ATP hydrolysis, but these costs can be avoided 
in several ways. In some cases a thioester intermediate 
is used to effectively couple together a carboxyl reduc- 
tion and a subsequent carboxylation, allowing the two 
reactions to be driven by a single ATP hydrolysis. An 
unfavorable (endergonic) reaction can also be coupled to 
a highly favorable (exergonic) reaction, allowing the re- 
actions to proceed without ATP hydrolysis. 

Individual pathways employ such couplings to varying 
degrees, resulting in a range of ATP costs associated with 
carbon fixation. At the low end, WL eliminates nearly all 
use of ATP through its unique pathway chemistry. Us- 
ing folates to reduce one-carbon units derived from CO2 
before incorporating them into growing substrates avoid 
the cost of carboxylation, saving an ATP, while the en- 
dergonic reduction of CO2 to CO is coupled to the sub- 
sequent exergonic synthesis of acetyl-CoA. Finally, the 
activated thioester bond of acetyl-CoA allows the subse- 
quent carboxylation to pyruvate to also proceed without 
additional ATP. As a result WL requires on a single ATP, 
for the attachment (and activation) of formate on THF 22 , 
in the synthesis of pyruvate from CO2 [SHIH]. Similarly, 



In methanogens this cost has been completely eliminated by 
modifying the structure of THF to that of H4MPT gl] 



12 



rTCA has high energetic efficiency as a result of exten- 
sive reaction coupling, requiring only 2 ATP to synthesize 
pyruvate from C0 2 [Ml IZ3] • Two ATP are saved by cou- 
pling carboxyl reductions to subsequent carboxylations 
using thioester intermediates, and an additional ATP is 
saved by coupling the carboxylation of a-ketoglutarate to 
the subsequent carbonyl reduction leading to isocitrate. 

At the high end of energetic cost of carbon-fixation 
are pathways that couple unfavorable reactions less ef- 
fectively, or not at all, or even hydrolyze ATP for re- 
actions other than carboxylation or carboxyl reduction. 
Both the DC/4HB pathway and the 3HP bicycle decou- 
ple one or more of the thioester-mediated carboxyl reduc- 
tion + carboxylation sequences such as used in rTCA, 
and neither couple endergonic carboxylations to exer- 
gonic reductions. As a result DC/4HB requires 5 ATP 
and the 3HP bicycle 7 ATP to synthesize pyruvate from 
C0 2 [661 [74]. The 3HP/4HB pathway has the highest 
cost of any fixation pathway, with 9 ATP required to 
synthesize pyruvate from CO2. This is partly because it 
also decouples thioester-mediated carboxyl reduction + 
carboxylation sequences, and partly because pyruvate is 
synthesized by diverting and ultimately decarboxylating 

23 Here, by stoichiometry we refer to the mole-ratios of reactants 
and products for each reaction, with molecules represented by 
their CHO constituents, and attached phosphate or thioester 
groups omitted. Where phosphorylation or thioesterification me- 
diates a net dehydration, we have represented the dehydration 



succinyl-CoA [53 [73 [73]. Finally, CBB is also at the high 
end in terms of cost, requiring 7 ATP to synthesize pyru- 
vate from CO2. Although this pathway avoids the cost 
of carboxylation reactions by coupling them to exergonic 
cleavage reactions, CBB is the only fixation pathway that 
invests ATP hydrolysis in chemistry other than carboxy- 
lations or carboxyl reductions, relatively increasing its 
cost [74] , 



3. Centrality and universality of the reactions in the 
citric-acid cycle, and the pillars of anabolism 

The apparent diversity of six known fixation pathways 
is unified by the role of the citric-acid cycle reactions, and 
secondarily by that of gluconeogenesis and the pentose- 
phosphate pathways. Fig. [5] presents the C,H,0 stoi- 
chiometry 23 for a network of reactions that includes all 
six known pathways. The network contains only 35 or- 
ganic intermediates, 24 because many intermediates and 
reactions appear in multiple pathways. 

directly in the figure. 
24 Hydroxymethyl-glutarate and butyrate are also shown, to indi- 
cate points of departure to isoprene and fatty acid synthesis, 
respectively. 



In Fig.[5]the TCA cycle and the gluconeogenic pathway 
are highlighted. Beyond being mere points of departure 
for alternative fixation pathways and for diversifications 
in intermediary metabolism, they are invariants under di- 
versification because they determine carbon flow among 
the universal precursors of biosynthesis. 

Almost all anabolic pathways in extant organisms orig- 
inate in one of five intermediates in the TCA cycle - ac- 
etate (as acetyl-CoA), pyruvate, oxaloacetate, succinate 
(or succinyl-CoA) or a-ketoglutarate - which have been 
dubbed the "pillars of anabolism" [30] • Succinyl-CoA 
can serve as the precursor to pyrroles (metal-coordinating 
groups in many cofactors) - mainly in a-proteobacteria 
and mitochondria - but these are more commonly made 
from a-ketoglutarate via glutamate in what is known as 
the C5 pathway [106j . A phylogenetic analysis of these 
pathways confirms that the C5 pathway is the most plau- 
sible ancestral form |107j . Thus as few as four TCA in- 
termediates provide the organic inputs to all anabolic 
pathways. Fig. [6] shows the major molecule classes as- 
sociated with each intermediate. The only exceptions to 
this universality, which form a biosynthetic sequence, are 
glycine, serine, and a few compounds synthesized from 
them; this sequence can be initiated directly from CO2 
outside of the pillars (see Fig. [4]) , an observation that be- 
comes key in reconstructing the evolutionary history of 



carbon-fixation (see Sec. IIIC). The gluconeogenic path- 



way then forms a similarly unique bridge between the 
TCA intermediate pyruvate (in the activated form phos- 
phoenolpyruvate) and the network of sugar-phosphate re- 
actions known as the pentose-phosphate pathway. 

Carbon-fixation pathways must reach all four (or five) 
of the universal anabolic starting compounds. They may 
do this either by producing them as pathway interme- 
diates, or by means of secondary reactions converting 
pathway intermediates into the essential precursors. The 
degree to which a pathway passes through all essential 
biosynthetic precursors may suggest its antiquity. In 
metabolism-first theories of the origin of life [S] , the lim- 
ited set of compounds selected and made available in 
high concentration by proto-metabolism determined the 
opportunities for further biosynthesis, thus establishing 
themselves as the precursors of anabolism. 

Among the five network-autocatalytic fixation path- 
ways, the CBB pathway is unique in not passing through 
any universal anabolic precursors. When used as a fix- 
ation pathway, CBB reactions must thus be connected 
to the rest of anabolism through the reverse of sev- 
eral reactions in the gluconeogenic pathways connect- 
ing 3-phosphoglycerate (3PG) to pyruvate. Pyruvate 
is then connected to the remaining precursors through 
partial TCA sequences. Glyoxylate produced from 2- 
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FIG. 5: The projection of the complete network for core carbon anabolism onto its CHO components. Phosphorylated 
intermediates and thioesters with Coenzyme-A are not shown explicitly. The bipartite graph notation used to show reaction 
stoichiometry is explained in App. [X] Arcs of the reductive citric acid cycle and gluconeogenesis are bold, showing that these 
pathways pass through the universal biosynthetic precursors. The Wood-Ljungdahl (labeled WL) pathway, without its cofactors 
and reductants shown, is represented by the last reaction of the acetyl-CoA synthase, which is the inverse of a disproportionation. 
Abbreviations: acetate (ACE); pyruvate (PYR); oxaloacetate (OXA); malate (MAL); fumarate (FUM); succinate (SUC); a- 
ketoglutarate (AKG); oxalosuccinate (OXS); isocitrate (ISC); cis-aconitate (CAC); citrate (CIT); malonate (MLN); malonate 
semialdehyde (MSA); 3-hydroxypropionate (3HP); acrolyate (ACR); propionate (PRP); methylmalonate (MEM); succinate 
semialdehyde (SSA); 4-hydroxybutyrate (4HB); crotonate (CRT); 3-hydroxybutyrate (3HB); acetoacetate (Ac ACE); butyrate 
(BUT); hydroxymethyl-glutarate (HMG); glyoxylate (GLX); methyl-malate (MML); mesaconate (MSC); citramalate (CTM); 
glycerate (GLT); glyceraldehyde (GLA); dihydroxyacetone (DHA); fructose (FRC); erythrose (ERY); sedoheptulose (SED); 
xylulose (XYL); ribulose (RBL); ribose (RIB). 



phosphoglycolate during photorespiration may alterna- 
tively be converted directly to glycine and serine (see 
Fig.|4}. 

Among the remaining loop- fixation pathways, only 
rTCA passes through all five anabolic pillars. Through 
its partial overlap with rTCA, DC/4HB passes through 
four, excluding cv-ketoglurate. The 3HP-bicycle further 
bypasses oxaloacetate, while the 3HP /4HB loop and WL 
include only acetyl-CoA. All of the latter pathways re- 
quire anaplerotic reactions in the form of incomplete (ei- 
ther oxidative or reductive) TCA arcs; when these com- 
bine (in various ways) with WL carbon fixation, they are 
known collectively as the reductive acetyl-CoA pathways. 

The most parsimonious explanation for the universal- 
ity of the TCA arcs as anaplerotic reactions is lock-in 



by downstream anabolic pathways, to which metabolism 
was committed by the time carbon-fixation strategies di- 
verged. This is a direct extension of the metabolism-first 
assumption that anabolic pathways themselves formed 
around proto-metabolic selection of the rTCA interme- 
diates. 25 (A similar but later form of commitment has 
been argued to convert basal gene regulatory networks 
in metazoan development into kernels, which admit no 
variation and act as constraints on subsequent evolution- 
ary dynamics [1081 .109 ].) If lock-in provides the correct 
interpretation of TCA universality, then much of the bur- 



25 Harold Morowitz summarizes this assumption with the phrase 
metabolism recapitulates biogenesis [6]. 
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often as one component in a disconnected autotrophic 
network using one of the loop fixation pathways as its 
other component. 

Because of such extensive redundancy, little innova- 
tion is required to explain the extant diversity of carbon 
fixation. All known carbon fixation strategies can be 
described as assemblies of a small number of strongly- 
defined modules, which govern not only the function of 
pathways, but also their evolution. 



1. Modularity in carbon fixation loops from re-use of 
pathway segments 



FIG. 6: The pillars of anabolism, showing lipids, sugars, 
amino acids, pyrimidines and purines, and tetrapyrroles from 
either succinate or AKG. Molecules with homologous local 
chemistry are at opposite positions on the circle. Oxidation 
states of internal carbon atoms are indicated by color (red = 
oxidized, blue = reduced). 



den of accounting for the inventory of small metabolites 
is shifted away from Darwinian selection for function in 
a post-RNA world, and onto constraints of biosynthctic 
simplicity and network context. We show below that phy- 
logenetic reconstruction is consistent with a selective role 
for rTCA cycling in the root metabolism of cellular life, 
though only as part of a larger network than the modern 
rTCA cycle. 



Fig. [JJ shows the sub-network from Fig. [5] contain- 
ing the four loop-autotrophic carbon fixation pathways 
that pass through some or all universal precursors, to- 
gether with reactions in the glyoxylate shunt. The four 
loop pathways are shown in four colors, with the organic 
pathway-intermediates (but not environmental precur- 
sors or reductants) highlighted. 

The figure shows that these pathways re-use interme- 
diates by combining entire pathway segments. The com- 
binatorial assembly of these segments is possible because 
they all pass through acetate (as acetyl-CoA), succinate 
(usually as succinyl-CoA) , and all except the second loop 
of the 3HP bicycle pass through both. Thus the con- 
served reactions among the autocatalytic loop carbon- 
fixation pathways are shared within strictly preserved se- 
quences, which have key molecules as the boundaries at 
which segments may be combined. 



B. Modularity in the internal structure and 
mutual relationships of the known fixation pathways 

Fig. [5] shows that the number of molecules and reac- 
tions required to include all carbon fixation pathways is 
much smaller than might have been expected from their 
nominal diversity, because many reactions are used in 
multiple pathways, and all pathways remain close to the 
universal precursors. We have already noted in the pre- 
vious section that this re-use goes beyond the require- 
ments of autocatalysis, to the anaplerotic role of rTCA 
arcs adapting variant fixation pathways to an invariant 
set of biosynthetic precursors. 

The aggregate network also shows many kinds of struc- 
ture: clusters, concentric rings, and ladders reflecting 
parallel sequences of the same inputs and outputs in dif- 
ferent pathways. We will show in this section that these 
result from re-use of local-group chemistry in transfor- 
mations of distinct molecules. 

At the end of the section we will describe a third form 
of re-use not represented in the aggregate graph. The 
folate-mediated direct Ci reduction sequence of Wood- 
Ljungdahl, responsible for the methyl group in the WL 
disproportionation reaction in Fig. [5] is also found as a 
free-standing fixation pathway across the bacterial tree, 



2. Homologous local-group chemistry across pathway 
segments 

In addition to the re-use of complete reactions in path- 
way segments, variant carbon-fixation pathways have ex- 
tensively re-used transformations at the level of local 
functional groups. The network of Fig. [JJ is arranged in 
concentric rings, in which the arcs of the rTCA cycle align 
with the 3HP or 4HB pathways, or with the mesaconate 
arc of the 3HP bicycle. The "ladder" structure of in- 
puts and outputs of reductant (H2) or water between 
these rings shows the similar stoichiometric progression 
in these alternative pathways. Fig. [8] decomposes the 
aggregate network into two pairs of short-molecule and 
long-molecule arcs, and the mesaconate arc, and shows 
the pathway intermediates in each arc. The figure makes 
clear that, both within the arcs of the loop pathways, 
and between alternate pathways, the type, sequence, and 
position of reactions is highly conserved. In particular, 
the reduction sequence from a-ketones or semialdehydes, 
to alcohols, to isomerization through enoyl intermedi- 
ates, is applied to the same bonds on the same carbon 
atoms from input acetyl moieties in rTCA, 3HP, and 
4HB pathways, and to analogous functional groups in 
the bicycle. Finally, in the cleavage of both citryl-CoA 
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H20 

FIG. 7: The four loop carbon-fixation pathways that pass 
through some or all of the universal biosynthetic precursors, 
from the graph of Fig. [5] rTCA is black, DC/4HB is red, 
3HP-bicycle is blue, and 3HP/4HB is green. The one aldol 
reaction from the glyoxylate shunt that is not part of the 
3HP-bicycle is shown in fine lines. The module-boundary 
nature of acetate (ACE) and succinate (SUC) is shown by the 
intersection of multiple paths in these compounds. Radially 
aligned reactions are homologous in local-group chemistry; 
deviations from strict homology in different pathways appear 
as excursions from concentric circles. 



and citramalyl-CoA, the bond that has been isomerized 
through the enoyl intermediate is the one cleaved to re- 
generate the network catalyst. 

Even the distinctive step to crotonyl-CoA in the 4HB 
pathway creates an aconate-type intermediate, and the 
enzyme responsible has high homology to the acrolyl- 
CoA synthetase |110l 1111] , whose output (acrolyl-CoA) 
follows the standard pattern. Only the position of the 
double bond breaks the strict pattern in crotonyl-CoA, 
and the abstraction of the un-activated proton required 
to produce this bond requires the unique ketyl-radical 
intermediate |112j . From crotonyl-CoA, the sequence to 
3-hydroxybutyrate is then followed by a surprising ox- 
idation and re-hydration, resulting in a 5-step, redox- 
neutral, sequence. The net effect of this sequence is 
to shift the carbonyl group (of succinate semialdehyde, 
SSA)) by one carbon (in acetoacetate, AcACE). Because 
the 4HB pathway takes in no new CO2 molecules, this 
isomerization enables regeneration of the network cata- 




FIG. 8: Comparison of redundant reactions in the loop 
carbon fixation pathways. Pathways are divided into "long- 
molecule" (upper-ranks) and "short-molecule" (lower-ranks) 
segments; long-molecule segments occupy roughly the upper- 
right half-plane in Fig. [7J and abbreviations are as in Fig. [5] 
Molecule forms are shown next to the corresponding tags. 
Bonds drawn in red are the active acetyl or semialdehyde moi- 
eties in the respective segments. Vertical colored bars align 
homologous carbon states. The yellow block shows retro-aldol 
cleavages of citrate or citramalate. Two molecules are shown 
beneath the tag CRT (crotonate): the greyed-out molecule in 
parentheses would be the homologue to the other aconitase- 
type reactions; actual crotonate (full saturation) displaces the 
double bond by one carbon, requiring the abstraction of the 
cv-proton in 4-hydroxybutyrate via the ketyl-radical mecha- 
nism that is distinctive of this pathway. 



lysts in the same way the reduction/aldol-cleavage se- 
quence enables regeneration for rTCA or 3HP. 

Duplication of reaction sequences in diverse fixation 
pathways results from retention of gene sets as organism 
clades diverged. Duplication of local-group chemistry in 
diverse reactions has resulted (at least in most cases) 
from retention of reaction mechanisms as enzyme fami- 
lies diverged. All enoyl intermediates are produced by a 
widely diversified family of aconitases |113j , while biotin- 
dependent carboxylations are performed by homologous 
enzymes acting on pyruvate and a-ketoglutarate |86). 26 



Similar to the synthesis of citryl-CoA we separate here the car- 
boxylation of a-ketoglutarate from the subsequent reduction of 
oxalosuccinate to isocitrate - performed by a single enzyme 
in most organisms - because it is argued to be the ancestral 
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and substrate-level phosphorylation and thioesterifica- 
tion are similarly performed by homologous enzymes on 
citrate and succinate in rTCA [SHI [SO]- 27 The wide cov- 
erage of a few reaction types may reflect their early es- 
tablishment by promiscuous catalysts |116j . followed by 
evolution toward increasing specificity as intermediary 
metabolic networks expanded and metabolites capable 
of participating in carbon fixation diversified. 

A functional identification of modules that seeks to 
minimize influence from historical effects (such as lock- 
in) has been carried out by Noor et al. |117j . and iden- 
tifies similar module boundaries. Using as data the 
first three numbers of the EC classification of enzymes 
- which distinguish reaction types but coarse-grain over 
both substrate specificity and enzyme homology - they 
show that many pathways in core metabolism are the 
shortest routes possible between inputs and products. 
Where their analysis overlaps with the pathways shown 
here, many of their minimal sequences overlap with the 
modules in Fig. [7J as well as with others in gluconeo- 
genesis which we do not consider here. Thus, returning 
to metabolism-first premises [6], it may be that histor- 
ical retention of a few reaction types reflects facility of 
the substrate-level chemistry, and that this has placed 
time-independent constraints on evolution. 

The functional-group homology shown in Fig. [8] allows 
us to separate stereotypical sequences of widely diversi- 
fied reactions from key reactions that distinguish path- 
ways. The stereotypical sequences lie downstream of re- 
actions such as the ferredoxin-dependent carbonyl inser- 
tion (rTCA), or biotin-dependent carboxylation (3HP), 
which are associated with highly conserved enzymes or 
cof actors. The downstream reactions are also more "ele- 
mentary" , in the sense that they are common and widely 
diversified in biochemistry, compared to the pathway- 
distinguishing reactions. 



3. Association of the initiating reactions with 
transition-metal sulfide mineral stoichiometrics and other 
distinctive metal-ligand complexes 

The observation that alternative fixation pathways are 
not distinguished by their internal reaction sequences, 
but primarily by their initiating reactions, suggests that 
these reactions were the crucial bottlenecks in evolution, 
and perhaps reflect the limiting diversity of chemical 
mechanisms for carbon bond formation. 28 The distinc- 
tive use of metals in the (often highly conserved) en- 



form [TT4l[TT5l 

However, the thioesterification of propionate is performed by dis- 
tinct enzymes in bacteria and archaea, an observation that has 
been interpreted to suggest convergent evolution [821 199] . 
Mechanisms of organosynthesis in aqueous solution are especially 
limited by the instability of radical intermediates, which may be 
stabilized by association with metal centers. 



zymes and cofactors for these initiating reactions may 
further suggest a direct link between prebiotic mineral 
and metal-ligand chemistry |118j . and constraints infer- 
able from the long-term structure of later cellular evolu- 
tion. 

Several enzyme iron-sulfur centers have been recog- 
nized [1191 1120] to use strained versions of the unit 
cells found in Fe-S minerals, particularly Mackinawite 
and Greigite. These are particular instances within a 
wider use of transition-metal-sulfide chemistry in core- 
metabolic enzymes. 

Pyruvate:ferredoxin oxidoreductase (PFOR), which 
catalyzes the reversible carboxylation of acetyl-CoA to 
pyruvate, contains three [Fe4S4] clusters and a thiamin 
pyrophosphate (TPP) cofactor. The [Fe 4 S4] clusters and 
TPP combine to form an electron transfer pathway into 
the active site, and the TPP also mediates carboxyl 
transfer in the active site |84) . 

The bifunctional carbon monoxide 

dehydrogenase/acetyl-CoA synthase (CODH/ACS) 
enzyme that catalyzes the final acetyl-CoA synthe- 
sis reaction in the WL pathway employs even more 
elaborate transition-metal chemistry. Like PFOR, this 
enzyme uses [Fe4S4] clusters for electron transfer, but 
its active sites contain additional, more unusual metal 
centers. The CODH active site contains an asymmetric 
[Ni-Fe 4 S 5 ] cluster on which C0 2 is reduced to CO [73], 
while the ACS active site contains a Ni-Ni-[Fe4S4] 
cluster on which CO (from CODH) and a methyl group 
from folates are joined to form acetyl-CoA [76rl78| . It 
was originally thought that a variant form of the ACS 
active site contains a Cu-Ni-[Fe4S4] cluster [1211 1122) . 
but it was subsequently shown that the Cu-containing 
cluster represents an inactivated form of ACS |77) . 
Similarly, it has also been shown that the open form 
of the Ni-Ni ACS may switch to a closed, inactivated, 
form by exchanging one of the nickel atoms for a zinc 
atom |76) . Finally, methyl-group transfer from the 
methyl-pterin to the ACS active site is mediated by the 
corrinoid iron-sulfur protein (CFeSP) containing the 
cobalt-tetrapyrrole cofactor cobalamin j79[ [80]. This 
transfer process involves the cycling between oxidation 
states of both the cobalt and one of the nickel atoms 
in the NiNi iron-sulfur cluster of ACS. In the first part 
of the transfer cobalt becomes oxidized from the Co(I) 
to the Co(III) state. The subsequent donation of the 
methyl-group to the ACS active site simultaneously 
reduces cobalt back to the Co(I) state and oxidizes the 
active nickel from the Ni(0) to the Ni(II) state. Finally, 
in the release of acetyl-CoA from the ACS the nickel is 
reduced back to the Ni(0) state, allowing the cycle to 
start over [50] . 

Perhaps not surprisingly, all these examples of metal- 
cluster enzymes concern catalysis not just of the forma- 
tion of C-C bonds, but of the incorporation of the small 
gas-phase molecule CO2. In general, enzymes involved in 
the processing of small gas-phase molecules (including H 2 
and N2) are among the most unique enzymes in biology 
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- all but one of the known Nickel-containing enzymes be- 
long to this group |123j - always containing highly com- 
plex metal centers in their active sites [124H129] . This 
indicates both the difficulty of controlling the catalysis 
of these reactions, and the importance of understanding 
their functions in the context of the origin of life |120j . 



4- Complex network closures: diversity and opportunity 
created by aldol reactions 

The network closures that retain carbon flux and en- 
able autocatalysis in rTCA, DC/4HB, and 3HP/4HB 
pathways are all topologically rather simple, and are 
quite similar due to the homology among most of the 
pathway intermediates. Their module boundaries also 
are all defined by acetate and succinate, and at least in 
the case of acetate, were probably facilitated by its mul- 
tiple pre-existing roles as the redox-drain of the rTCA 
cycle [33 and the starting point for both isoprenoid and 
fatty-acid lipid biosynthesis. 

In contrast, the topology of the 3HP-bicycle appears 
complex, and perhaps an improbable solution to the 
problem of recycling all carbon flux through core path- 
ways. 29 If we are to argue that the emergence or evolu- 
tion of such network closures is facilitated by a form of 
modularity, it must exist at the level of reaction mech- 
anisms that render their discovery less improbable. For 
the 3HP bicycle and the related glyoxylate shunt - and to 
a lesser degree also for rTCA - the mechanism of interest 
is the aldol reaction. 

The aldol reaction is an internal oxidation-reduction 
reaction, which means that it exploits residual free en- 
ergy from organosynthesis, and also that it can take place 
independently of external electron donors or acceptors. 
Many aldol reactions are also kinetically facile, occur- 
ring at appreciable rates without the aid of catalysts. We 
therefore expect that among compounds capable of par- 
ticipating in them, aldol reactions would have been com- 
mon in the prebiotic world, providing opportunities for 
pathway generation. Since their diversity is difficult to 
suppress except by special mechanisms |130j . we expect 
that potential aldol reactions among metabolites would 
either have become regulated (perhaps through phospho- 
ryl occupation of hydroxyl groups) or else incorporated 



The many parallel connections in networks such as the 3HP- 
bicycle or the pentose-phosphate network (see Fig. [5]l make the 
problem of metabolite interconversion complex 61 in a different 
way than arises in the metabolic "bowtie" . Optimal conversion 
within the bowtie consists of finding common molecular "divi- 
sors" of input and output metabolites, and so can be seen even 
in the number theory of highly abstracted string chemistries 1441 . 
The fact that short paths exist from most metabolites to a small 
set of building blocks is, in our interpretation, a reflection of the 
prior role of the core (where the building blocks are first cre- 
ated) in defining the possibilities for later anabolism and thus 
the metabolites reached by the bowtie. 



into actively-used biochemical pathways. 

Aldol reactions are important generators of diversity 
in organic chemistry, notorious for the very-complex net- 
work known as the formose reaction |131H134] . initi- 
ated from formaldehyde and glycolaldehyde. Many al- 
dol reactions are possible for sugars, and the reductive 
pentose phosphate pathway is indeed a network of se- 
lected aldol condensations and cleavages among sugar- 
phosphates [61] , 

ACE 




FIG. 9: The 3-hydroxypropionate bicycle (blue) and the 
glyoxylate shunt (orange) compared. Directions of flow are 
indicated by arrows on the links to acetate (ACE). The com- 
mon core that enables flux recycling in both pathways is the 
aldol reaction between glyoxylate (GLX), acetate, and malate 
(MAL). The four other aldol reactions (labeled by their 
cleavage direction) are from isocitrate (ISC), methyl-malate 
(MML), citrate (CIT), and citramalate (CTM). Malate is a 
recycled network catalyst in both pathways. Carbon is fixed 
in the 3HP-bicycle as pyruvate (PYR), so the cycle only be- 
comes autocatalytic if pyruvate can be converted to malate 
through anaplerotic (rTCA) reactions. 



Fewer aldol reactions are possible among intermediates 
of the rTCA cycle and their homologues such as methyl- 
malate or citramalate in other carbon-fixation pathways, 
but all possibilities are indeed used either in intermedi- 
ary metabolism or in carbon fixation. Fig. [9] shows the 
3-hydroxypropionate bicycle and the closely-related gly- 
oxylate shunt. In both pathways, the network topologies 
that regenerate all carbon flux or achieve autocatalysis 
are created by aldol reactions. The retention of carbon 
within the shunt appears to be a reason for its widespread 
distribution and frequent use |105[ I135[ I136j , even when 
energetically more-efficient pathways such as the Krebs 
cycle exist as alternatives within organisms. 
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5. Re-use of the direct CI reduction pathway and hybrid 
fixation strategies 

A unique form of re-use is found for the sequence of 
reactions that directly reduce one-carbon (Ci) groups 
on pterin cofactors. We have argued elsewhere [H] 
that even when a complete, autotrophic WL pathway 
is not present due to the loss of the oxygen-sensitive 
CODH/ACS enzyme, the direct Ci-reduction sequence 
on pterins is often still present and being used as a par- 
tial fixation pathway. The reaction sequence supplies the 
diverse methyl-group chemistry mediated by S-adenosyl- 
mcthioninc, and the direct synthesis of glycine and serine 
from methylene groups, reductant, and ammonia. These 
then serve as precursors to cysteine and tryptophan. The 
pathway may exist in either a complete (8-reaction) or a 
previously-unrecognized but potentially widespread (7- 
reaction) form that involves uptake on N 5 rather than 
N 10 of THF [H] (see Fig. [Z]) 

The widely distributed and diversified form of direct 
Ci reduction functions much as auxiliary catabolic path- 
ways function in mixotrophs [5], operating in parallel to 
an independent "primary" fixation pathway, with the pri- 
mary and the direct-Ci pathway supplying carbon to dif- 
ferent subsets of core metabolites. In many cases where 
the CODH/ACS is lost, this loss disconnects the primary 
and direct-Ci pathway segments, creating the novel fea- 
ture of a disjoint carbon fixation pathway. 30 The ex- 
istence of parallel fixation pathways in autotrophs had 
previously been recognized only in one (relatively late- 
branching) 7-proteobacterium, the uncultured endosym- 
biont of the deep-sea tube worm Riftia pachyptila, which 
was found to be able to use both the rTCA and CBB 
cycles |137j . In that case, however, the two pathways 
are not disjoined, but rather connected through inter- 
mediates in the glycolytic/gluconeogenic pathways. In 
addition, the capacity for using either cycle is thought 
to reflect an ability to adapt to variation in the avail- 
ability of environmental energy sources, with an appar- 
ent up-regulation of the more efficient rTCA cycle un- 
der energy-poor conditions 137 . Our phylogenetic re- 
construction |41) . however, indicate that parallel disjoint 
pathways were the majority phenotype in the deep tree of 
life, in which a reductive CI sequence to glycine and ser- 
ine is preserved in combination with with rTCA in Aquif- 
icales and Nitrospirae, with CBB in Cyanobacteria, with 
the 3HP bicycle in Chloroflexi (all bacteria), and with 
DC/4HB in Desulfurococcales and Acetolobales, and the 
3HP/4HB cycle in Sulfolobales (all archaea). In contrast, 
the full WL pathway is found only in a subset of lin- 



A secondary connection between the two components may ex- 
ist in the form of oxidative conversion of 3-phosphoglycerate to 
serine. This connection may lead subsequently to the loss of 
direct-Ci reduction as a fixation route, as in the proteobacteria, 
or it may release a constraint leading to change in pterin cofactor 
chemistry as in methanogens, discussed below. 



eages of bacteria (especially acetogenic Firmicutes) and 
archaea (methanogenic Euryarcheota) . 

Apparently as a result of the flexibility enabled by 
parallel carbon inputs to core metabolites, the direct 
Ci reduction sequence is more universally distributed 
than any of the other loop-networks (whether paired 
with Ci reduction or used as exclusive fixation path- 
ways), or than the complete WL pathway. The status of 
the pterin-mediated sequence as a module appears more 
fundamental than its integration into the full WL path- 
way, and comparable to the arcs identified within rTCA, 
which may function as parts of fixation pathways or al- 
ternatively as anaplerotic extensions to other pathways. 
The two types of pathways also serve similar functional 
roles in our phylogenetic reconstruction of a root carbon- 
fixation phenotype, as the key components enabling and 
selecting the core anabolic precursors. 

The reductive synthesis of glycine furnishes a potent 
reminder of the importance of taking evolutionary con- 
text into account when interpreting results from studies 
of metabolism. Historically much of our understanding 
of biochemistry comes from the study of human (or more 
generally mammalian) physiology, which can introduce 
biases. We noted above the example of the reductive 
citric acid cycle, which is sometimes called the "reverse" 
citric acid cycle even though it was ancestral to the oxida- 
tive form. Similarly, the "glycine cleavage system" (CCS) 
was originally studied in rat and chicken livers |138) . 
before being recognized as phylogenetically widespread. 
The distribution of this system is now known to be nearly 
universal across the tree of life (with methanogens be- 
ing the main systematic exception, for reasons explained 
elsewhere [41]), suggesting it was present already in the 
LUC A. The lipoyl-protein based system has long been 
know to be fully reversible [138H140] . and has nearly 
neutral thermodynamics at physiological conditions |74j . 
Thus, the LUCA could have this system either to syn- 
thesize or to cleave glycine. From this perspective the 
former possibility (synthesis) seems a more likely inter- 
pretation, even without additional data. Absent the in- 
terpretation bias from mammalian physiology, the choice 
between these alternatives might have become clear much 
sooner. 



C. A coarse-graining of carbon-fixation pathways 

We can combine all the previous observations on mod- 
ularity in carbon-fixation - the sharing of arcs between 
loop pathways, the re-use of TCA and reductive CI se- 
quence to complete the set of anabolic pillars - to perform 
a "coarse-graining" of carbon-fixation. Combining the 
decomposition of Fig. [7] with the gluconeogenic and WL 
pathways in Fig. [5j we may list the seven modules from 
which all known autotrophic carbon-fixation pathways 
are assembled: 1) direct one-carbon reduction on folates 
or related compounds, with or without the CODH/ACS 
terminal reaction of WL; 2) the short-molecule rTCA arc 
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from acetyl-CoA to succinyl-CoA; 3) the long-molecule 
rTCA arc from succinyl-CoA to citryl-CoA; 4) the gluco- 
neogenic/reductive pentose-phosphate pathway, with or 
without the RubisCO reaction of CBB; 5) the 3HP arc 
from acetyl-CoA to succinyl-CoA; 6) the long-molecule 
4HB pathway from succinyl-CoA to acetoacetyl-CoA; 
7) the glyoxy late-shunt /mesaconate pathway to citra- 
malate, which is the long-molecule loop in the 3HP bicy- 



cle. Fig. 10 shows the summary of these modules at the 
pathway level, as well as their different combinations to 
form complete autotrophic carbon-fixation pathways. 

The importance of including glycine in the set of an- 
abolic pillars immediately becomes clear in this coarse- 
grained view. The general similarity among different 
carbon-fixation pathways increases significantly, while 
finer distinction between forms becomes possible. In 
particular, both of the pathways that have been most 
commonly discussed in the context of ancestral carbon- 
fixation and the origin of life, WL and rTCA [7J [STJ E9 
155] . separate into deep- and late-branching forms. The 
increased similarity of the deep-branching forms of these 
pathways suggests an underlying template that combines 
both WL and rTCA in a fully connected network. WL 
and rTCA differ from this linked network by single re- 
actions associated either with energy (ATP) economy or 
oxygen (or perhaps other oxidant) sensitivity. Combin- 
ing information on the synthesis, structural variation, 
ecology and phylogenetics of the pterin molecules upon 
which direct Ci reduction is performed similarly sug- 
gests a distinction between the acetogenic (bacterial) and 
methanogenic (archaeal) forms of WL associated with 
energy economy [UJ. A "proto-tree" of carbon-fixation 
emerges from the pooling of these different observations, 
which in turn makes it possible to reconstruct a complete 
phylometabolic tree of carbon-fixation, as discussed in 
detail in section MI Dl below. 
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FIG. 10: Coarse-grained summary of carbon-fixation path- 
ways. The left panel shows the six pathways as they are 
known from extensive laboratory characterization. Includ- 
ing glycine along with the anabolic pillars as the molecules 
that must be reached in carbon-fixation then adds resolu- 
tion, allowing finer distinctions among forms and generally 
increasing their similarity. As a result, underlying evolution- 
ary templates and patterns begin to emerge. The panel on 
the bottom right shows the modules from which all carbon- 
fixation pathways are constructed, as outlined in the main 
text. 



1. How the inventory of elementary modules has 
constrained innovation and evolution 

The essential invariance across the biosphere of the 
seven sub-networks listed above allows us to represent 
all carbon-fixation phenotypes in terms of the presence or 
absence, connectivity, and direction of these basic mod- 
ules. In this representation, metabolic innovation at the 
modular level retains the character of individual discrete 
events, even if the pathway segments involved incorpo- 
rate multiple genes. In cases where multiple genes must 
be acquired to constitute a module, as in the innovation 
of the 4HP pathway, this innovation may take place at 
higher levels of metabolism (e.g. fermentative secondary 
metabolism), after which their incorporation as fixation 
pathways appears appears as a single innovation. 

Because the module boundaries are defined by par- 
ticular (often universal) molecular species (e.g., acetyl- 
CoA, succinyl-CoA, and ribulose-l,5-bisphosphate) it of- 
ten remains true that innovation can be traced to the 



change in single genes. This is true for the loss of the 
CODH/ACS from acetyl-CoA phenotypes, the innova- 
tion of RubisCO in CBB bacteria, or the loss of substrate- 
level phosphorylation to acetyl-CoA or succinyl-CoA in 
acetogens. A case with only slightly greater complexity is 
the apparently repeated, convergent evolution of an ox- 
idative pathway to form serine from 3-phosphoglycerate 
(3PG), which involves three common and widely diversi- 
fied reactions: a dehydrogenation, a reductive transami- 
nation, and a dephosphorylation. 

At the module level, we may represent changes in 
carbon fixation pathways between closely-related phe- 
notypes in terms of single connections, disconnections, 
or overall changes of direction within the subsets of the 
seven modules which are present. The change of direction 
within modules is usually complete, even if it is partial or 
intermediate at the level of whole pathways. An exam- 
ple is the switch from autotrophic rTCA to fermentative 
TCA using a reductive small-molecule arc and an oxida- 
tive large-molecule arc |91j . Such fermentative pathways 
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may alternate with fully oxidative TCA (Krebs) cycling, 
and they often occur in organisms that carry homologues 
to genes for both oxidative and reductive pathway direc- 
tions [H2HSI]- 

An important exception to this pattern is the par- 
tial reversal of the formyl-to-methylene sequence on fo- 
lates, between its carbon-fixation role and its role in 
the catabolic cleavage of glycine. We refer in Ref. [H] 
to the module formed by combining the GCS with the 
methylene-serine transferase as the glycine cycle. The 
combination of the complex free energy landscape pro- 
vided by the folates [65 with the reversibility and nearly 
neutral thermodynamics of the glycine cycle [7H 1138] 
permits a high degree of flexibility within this module. 
Carbon can enter either directly through CO2, through 
serine (from 3PG) , or through glycine (from glyoxylate) , 
and from any of these sources may be redirected to all 
of Ci chemistry. The topology of the main reaction se- 
quence is preserved in all these cases of reversal, though 
new enzymes or cofactors may be recruited to reverse 
some reactions. 31 



D. Reconstructed evolutionary history 



phylogenetic distribution. (For example, TCA arcs and 
intermediates, as well as direct Ci-reduction, are nearly 
universally distributed, while the 3HP arcs are restricted 
to specific bacterial or archaeal clades living in alkaline 
environments.) Finally, we note that not all module com- 
binations consistent with autotrophy have been observed 
in extant organisms. 

By combining these observations it is possible to ar- 
range autotrophic phenotypes on a graph according to 
their degree of similarity, and to assign environmental 
factors as correlates of phenotypic changes over most 
links. The graph projects onto a tree with very high par- 
simony and therefore almost no requirement to invoke 
either horizontal gene transfer or convergent evolution 
from distinct lineages. With a natural choice of root mo- 
tivated by the overlap with bacterial and archaeal phy- 
logeny, links become directed and environmental factors 
take on the interpretation of evolutionary causes. The 
lack of reticulation in a tree of innovations in autotrophy 
- at first surprising when compared to highly-reticulated 
gene phylogenies [142] covering the same period - be- 
comes sensible as a record of invasion and adaptation 
to new chemical environments by organisms capable of 
maintaining little long-standing variation. 



1. Phylogeny suggests little historical contingency of deep 
evolution within the modular constraints 



The small number of modules that contribute to car- 
bon fixation, and the even smaller number of "gate- 
way" molecules that serve as interfaces between most of 
them, permit free recombination into many phenotypes 
satisfying the constraints of autotrophy. An important 
consequence of free recombination is that the external 
constraint (autotrophy) does not lock in dependencies 
within networks over separations larger than the modules 
themselves. Homology across intra-modular reaction se- 
quences - especially if it is due to catalytic promiscuity 

further weakens any lock-in effect created by selection 
for metabolic completeness. Through these mechanisms 
modularity promotes innovation-sharing j!41j and rapid 
and reliable adaptation |18j to environmental conditions, 
but reduces standing variation among individuals sharing 
a common environment. 

As we reviewed in Sec. |III[ distinct carbon-fixation 
pathway modules have very different couplings to the 
chemical environment. The genome distributions re- 
ported in Ref. [41 show that they also have very uneven 



2. A 



parsimony tree for autotrophic metabolism, and 
causation on links 



The tree of autotrophic carbon-fixation phenotypes 
from Ref. 
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All nodes in the 



[4Tj is shown in Fig 
tree satisfy the constraint that all five universal anabolic 
precursors plus glycine can be synthesized directly from 
CO2. We have defined parsimony by requiring single 
changes over links at the level of pathway modules, as 
explained above, rather than at the level of single genes, 
in cases where the two criteria differ. (This definition 
separates the evolution of genetic backgrounds, such as 4- 
hydroxybutyrate fermentation, from the events at which 
organisms came to rely on complete pathways for au- 
totrophy.) A complete-parsimony tree for the known phe- 
notypes is not possible, so we chose a tree in which the 
only violations are duplicate innovation of serine synthe- 
sis from 3-phosphglycerate (common reactions and diver- 
sified enzyme families) , and duplication or transfer of the 
short-molecule 3HP pathway (common environments). 

The nodes in the tree of Fig. [TT] are all phenotypes 
of extant organisms, with one important exception, 32 



An example is the reversal of the complete rTCA cycle to the 
oxidative Krebs cycle. The electron donor in rTCA, reduced 
ferredoxin, is replaced by lipoic acid as an electron acceptor in the 
Krebs cycle, in the TPP-dependent oxidoreductase reaction. The 
enzymes catalyzing the retro-aldol cleavage of rTCA, which have 
undergone considerable re-arrangement even within the reductive 
world 89 90], were further modified to the oxidative citryl-CoA 
synthetase. 



There is also one unimportant exception, which is the insertion 
of an acetogenic phenotype with a facultative oxidative pathway 
to serine at the root of the Euryarchaeota. Since methanogens 
use this pathway, and since an acetogenic pathway lacking ox- 
idative serine synthesis is the most plausible ancestral form for 
all archaea as well as for Firmicutes within the bacteria, we infer 
that such an intermediate state did or does exist. This fixation 
pathway is consistent with forms observed in extant organisms, 
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which is the node between Aquificale branch and the 
Firmicute/Archaea branch. Aquificales and all pheno- 
types descending from them lack the CODH/ACS en- 
zyme, while firmicutes and archaea lack one or more 
ATP-dependent acyl-CoA (citryl-CoA or succinyl-COA) 
synthases. Therefore, if we seek a connected tree of life, 
two changes - the gain of one enzyme and loss of the 
other - are required to connect these branches. Since 
any organism lacking both enzymes could not fix carbon 
autotrophically, we have chosen the order of gain and loss 
so that the intermediate node has both the CODH/ACS 

and we expect that such a phenotype either will be discovered or 
will result from reclassification of genes in an existing organism. 



and the acyl-CoA synthases. It therefore has both a 
complete WL pathway and an autocatalytic rTCA loop, 
connected through their shared intermediate acetyl-CoA. 
Losses (but not re-acquisitions) of either of these enzymes 
occur at multiple points on the tree, and both have likely 
explanations in cither environmental chemistry or ener- 
getics. For this reason and several others given below, 
although a parsimony tree is (a priori) unrooted, we 
will regard the joint WL/rTCA phenotype as not only 
a bridging node but the root of the tree of autotrophs. 
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FIG. 11: A parsimony-based reconstruction of the innovations linking the major carbon-fixation phenotypes, from Ref. [41] . 
Nodes in the tree are autotrophic phenotypes, following the coarse-grained notation introduced in Fig. |10[ and summarized in 
the legend. Grey links are transitions in the maximum-parsimony phylometabolic reconstruction, and yellow-highlighted regions 
in the diagrams are innovations following each link. Organism names or clades in which these phenotypes are found are given in 
black; fixation pathways innovated along each link are shown in blue, and imputed evolutionary causes are shown in red. S^" - ^ 
refers to sulfides of different oxidation states. Dashed lines separate regions in which the clades by phylometabolic parsimony 
follow standard phylogenetic divisions. Abbreviations: formyl (HCO— ); methylene (— CH2 — ); acetyl-CoA (ACA); pyruvate 
(PYR); serine (SER); 3-phosphoglycerate (3PG); glyceraldehyde-3-phosphate (GAP); fructose-l,6-bisphosphate (F6B); ribose- 
phosphate (RIB); ribulose-phosphate (RBL); akalinity (ALK). Arrows indicate reaction directions; dashed line connecting 3PG 
to SER indicate intermittent or bi-directional reactions. 



In the evolution of carbon fixation from a joint 
WL/rTCA root, the primary division is between the 
loss of the CODH/ACS, resulting in rTCA loop-fixation 
phenotypes, and the loss of the acetyl-CoA or succinyl- 
CoA synthetases, resulting in acetogenic phenotypes. 



Very low levels of oxygen permanently inactivate the 
CODH/ACS, so its loss is probable even under mi- 
croaerobic conditions. Although the dominant mineral 
buffers for oxygen in the Archaean remain a topic of 
significant uncertainty 143 146 , it appears implausi- 
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ble that molecular oxygen was the toxin responsible 
for loss of the CODH/ACS much before the "Great 
Oxidation Event" (GOE). 33 Therefore the sensitivity 
of the CODH/ACS to sulfides or perhaps other oxi- 
dants (S. Ragsdale, pers. comm.) remains a possibly im- 
portant factor in the early divergences of carbon fixation. 

Alternatively, among strict WL-anaerobes, the loss of 
citryl-CoA or succinyl-CoA synthetase saves one ATP per 
carbon fixed, and all acetogenic phenotypes break rTCA 
cycling only through the loss of one or the other of these 
enzymes. We therefore interpret the loss of rTCA cycling 
as a result of selection for energy efficiency. The failure 
to regain either of these enzymes by acetogens which sub- 
sequently also lost the CODH/ACS is perhaps surprising 
given the inferred homology of the ancestral citryl-CoA 
and succinyl-CoA synthetases [5§1 HO] , but explains the 
absence of rTCA cycling in either Firmicutes or any Ar- 
chaea. 

The remaining autotrophic phenotypes are derived 
from either rTCA cycling or acetogenesis in natural 
stages due to plausible environmental factors. Oxida- 
tive serine synthesis (from 3PG) is associated with the 
rise of the proteobacteria, whose differentiation in many 
features tracks the rise of oxygen and the transition 
to oxidizing rather than reducing environments. Ru- 
bisCO and subsequently photorespiration arise within 
the cyanobacteria. The innovation of the 3HP bicycle 
from the malonate pathway arises within the Chloroflexi. 
In both Firmicutes (bacteria) and the crenarchaea, 4- 
hydroxybutyrate (or closely related 4-aminobutyrate) 
fermentation is more or less developed. Closure of the 
fermentative arcs to form a ring, again driven by elim- 
ination of the CODH/ACS gT] leads to the DC/4HB 
pathway in Crenarchaeota, which is then specialized in 
the Sulfolobales to the alkaline 3HP/4HB pathway. The 
Euryarchaeota are distinguished by the absence of an 
alternative loop- fixation pathway to rTCA, so that all 
members are either methanogens or heterotrophs. 

Similarly, the innovation of the 3HP pathways, using 
biotin, emerges as a specialization to invade extreme but 
relatively rare environments. A particularly interesting 
case is the modification of folates in archaea, leading from 
THF in ancestral nodes to tetrahydromethanopterin in 
the methanogens, which enables initial fixation of for- 
mate (formed by hydrogenation of CO2) in an ATP-free 
system [1TJ |SS] . The root position of rTCA explains the 
preservation of rTCA arcs both in reductive acetyl-CoA 
pathways, and in anaplerotic pathways for other fixation 
pathways, and the root position of direct Ci reduction 
explains its near-universal distribution. 



33 The GOE is usually dated at 2.5 GYA, though arguments exist 
for low levels of oxygen as much as 50-100 million years ear- 
lier |147l 1148] . These may be relevant dates to compare to ge- 
netically estimated loss events in later branches of the Archaea 
or possibly in the Clostridia, but they are not plausible as dates 
for the first branching in the tree of Fig. 



Parsimony violation and the role of ecological 
interactions 



A tree is by construction a summary statistic for the 
relations among the phenotypes which are its leaves or 
internal nodes. It is not inherently a map of species de- 
scent, and takes on that interpretation only when com- 
mon ancestry is shown to explain the conditional in- 
dependence of branches given their (topological) parent 
nodes. This caution is especially important for the inter- 
pretation of Fig. which shows high parsimony in the 
deepest branches where horizontal gene transfer is gen- 
erally believed to have been most intense (5TJ [52]. We 
have argued that this behavior is consistent in a tree of 
successive optimal adaptations to varied environments, 
by organisms that could maintain little persistent vari- 
ation. Violations of parsimony that are improbable by 
evolutionary convergence contain information about con- 
tact among historically separated lineages. Under this 
interpretation the separation is primarily environmen- 
tal, with the subsequent contact identifying ecological 
co-habitation. The possible transfer of genes for the 3HP 
pathway is especially plausible, as the organisms involved 
may have shared the same extreme (alkaline) environ- 
ments and been under common selection pressure, which 
when severe is known to accelerate rates of gene trans- 

for [ngirrsu] . 

While our methods in Ref. [H] (flux-balance analy- 
sis of core networks) may be interpreted as producing 
either organism models or meta-metabolome models of 
consortia, the general agreement with robust phyloge- 
netic signatures from many different genomic phyloge- 
nies [142[ I151| 1152] may still suggest a dominant role for 
vertical descent among autotrophic organisms (and not 
merely consortia) in the early evolution of carbon fixa- 
tion. 



4- A non-modern but plausible form of redundancy in the 
root node 



The joint WL/rTCA network was introduced into 



Fig. 11 to produce a connected tree containing only au- 
totrophic nodes. Our constraints in choosing it led to 
a kind of redundancy not found in extant fixation path- 
ways. Either WL or rTCA alone is self-maintaining (in 
a modern organism) so a network that incorporates both 
is redundantly autocatalytic. While this is an important 
and speculative departure from all known phenotypes, it 
can be argued to reduce fragilities in both WL and rTCA 
under conditions of poor catalysis or unreliable regula- 
tion of anabolism. In that respect it is a more plausible 
phenotype for a universal ancestor than any modern net- 
work. 

The enhanced robustness of the joint network follows 
from the interaction of short-loop and long-loop auto- 
catalysis. The threshold for autocatalysis in the rTCA 
loop, fragile against parasitic side reactions or uncon- 
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strained anabolism, is supported and given a recovery 
mode when fed by an independent supply of acetyl-CoA 
from WL. In turn, the production of a sufficient concen- 
tration of folates to support direct Ci reduction, fragile if 
the long biosynthetic pathway is unreliable, is augmented 
by additional carbon fixed in rTCA. These arguments 
are topological, and do not make specific reference to 
whether the catalysts for the underlying reactions are 
enzymes. They may provide context for (perhaps multi- 
stage) models of transition from primordial mineral catal- 
ysis [551 1153] to the eventual support of carbon fixation 
by biomolecules. 

Fig. 12 shows a numerical solution for the current flow 



Fraction of equilibrium acetate from driving rTCA and WL in parallel 



through a minimal version of the joint WL/rTCA net- 
work, with lumped-parameter representations of para- 
sitic side reactions and the net free energy of formation 
of acetate. (The exact rate equations used, and their in- 
terpretation, are provided in an App. |A|) In the absence 
of a WL "feeder" pathway, rTCA has a sharp thresh- 
old for the maintenance of flux through the network as a 
function of the free energy of formation of its output ac- 
etate. The existence of such a sharp threshold depending 
on the rate of parasitism, below which the cycle supports 
no transport, has been one of the major sources of crit- 
icism of network-autocatalytic pathways as models for 
proto- metabolism [154] . When WL is added as a feeder, 
however, the threshold disappears, and some nonzero flux 
passes through the pathway at any positive free energy 
of formation of the outputs. 

Chemical self-amplification, if it can be demonstrated 
experimentally, is the most plausible mechanism by 
which the biosphere can concentrate all energy flows and 
material cycles through a small, stable set of organic com- 
pounds. It supplies the molecules that are within the loop 
- and secondarily those that are made from loop interme- 
diates - above the concentrations they would have in a 
Gibbs equilibrium distribution, as a result of flow through 
the network. The fact that self-amplification is permitted 
to act in the model of Fig. [T2j even below the chemical- 
potential difference where the rTCA loop alone is self- 
sustaining, provides a mechanism by which the loop in- 
termediates could have been provided in excess supply in 
the earliest stages of the emergence of metabolism. We 
return in Sec. IVII to a related form of robustness and se- 
lection, which applies as anabolic pathways begin to form 
from loop intermediates. 



E. The rise of oxygen, and changes in the 
evolutionary dynamics of core metabolism 

The limits of the phylometabolic tree we show in 
Fig. [TT] fall on a horizon that coincides with the rise 
of oxygen. More precisely: we do not show branches 
that phylogenetically trace lineage divisions later than 
this horizon, because no known divisions in carbon fix- 
ation distinguish such later branches. Many of the late 
branches contain only heterotrophs, and to the extent 




FIG. 12: Graph of solutions to Eq. ( |A44| | from App. [A] is 
shown versus base- 10 logarithms of z r TCA an< l Z WL- The 
quantity x on the z-axis is the fraction of the acetate con- 
centration [ACE] relative to the value it would take in an 
equilibrium ensemble with carbon dioxide, reductant, and wa- 
ter. The value x = 1 corresponds to an asymptotically zero 
impedance of the chemical network, compared to the rate of 
environmental drain. The parameter z r TCA is a monotone 
function of the non-equilibrium driving chemical potential to 
synthesize acetate, and zwl measures the conductance of the 
"feeder" WL pathway. At zwl 0, the WL pathway con- 
tributes nothing, and the rTCA network has a sharp catalytic 
threshold at z r TCA = 1- F° r nonzero z\yL> the transition is 
smoothed, so some excess population of rTCA intermediates 
occurs at any driving chemical potential. 



that post-oxygen lineage divisions follow divisions in 
metabolism, they are divisions in forms of heterotrophy. 
The rise of oxygen seems to have put an end to innovation 
in carbon fixation, and led to a florescence of innovation 
in carbon sharing. 34 

On the same horizon, the high parsimony of the tree 
we have shown ends, and it becomes necessary to ex- 
plain complex metabolisms as a consequence of transfer 
of metabolic modules among clades in which they had 
evolved separately. We no longer expect that it would 
be possible to explain - and to some extent to predict - 
these innovations given only constraints of chemistry and 



34 By "sharing" we refer to general exchanges in which organic com- 
pounds are re-used without de novo synthesis; we do not intend 
only symbiotic associations. At the level of aggregate-ecosystem 
net primary production, the exchange of organics with incom- 
plete catabolism may, however, reduce the free energy cost of 
the de novo synthesis of biomass that supports a given level of 
phenotypic diversity or specialization, allowing ecologies of com- 
plementary specialists to partially displace ecologies of generalist 
autotrophs. 
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invasion of new geochemical environments. Instead, they 
rely chemically on ecologically determined carbon flows, 
and genetically on opportunities for transfer of genes or 
pathway segments. Therefore any explanation will re- 
quire some explicit model of ecological dynamics, and 
may require invoking some accidents of historical contin- 
gency. This contrast of phylometabolic reconstructions, 
between later and earlier periods, illustrates our associ- 
ation of parsimony violation with the role of ecosystems 
and explicit contributions of multilevel dynamics to evo- 
lution. 

It is perhaps counterintuitive, but we believe consis- 
tent, that the phylometabolic tree is more tree-like in 
the earlier era of more extensive single-gene lateral trans- 
fers, and becomes less tree-like and more reticulated, 
in the era of complex ecosystems enabled by oxygenic 
metabolisms, which may have come as much as 1.5 bil- 
lion years later. For reticulation to appear in a tree of 
reconstructed metabolisms, it is necessary that variants 
which evolved independently - as we have argued, un- 
der distinct selection pressures - be maintained in new 
environments where they can be brought into both con- 



tact and interdependence. The maintenance of standing 
variation is facilitated both by the evolution of more ad- 
vanced mechanisms to integrate genomes and limit hori- 
zontal transfer, and by the greater power density of oxy- 
genic metabolisms. 

The serine cycle used by some methylotrophic pro- 
teobacteria, shown in Fig. |13| provides an example of 
the structure and complex inheritance of a post-oxygen, 
heterotrophic pathway. Methylotrophs possess both 
an H4MPT system transferred from methanogenic ar- 
chaea |1551 1156] . and a conserved THF system ances- 
tral to the proteobacteria (and we argue, to the univer- 
sal common ancestor). In methylotrophs, H4MPT is pri- 
marily used for the oxidation of formaldehyde to formate, 
while THF can be used in both the oxidative direction as 
part of the demethylation of various reduced one-carbon 
compounds, and in the reduction of formate. Ci com- 
pounds are then assimilated either as CO2 in the CBB 
cycle, as methylene-groups and CO2 in the serine cycle or 
as formaldehyde in the ribulose monophosphate (RuMP) 
cycle, in which formaldehyde is attached to ribulose-5- 
phosphate to produce fructose-6-phosphate |157l 1158] . 



3HB 




FIG. 13: The serine cycle/glyoxylate-regeneration cycle of methylotrophy. Left panel shows the stoichiometric pathway overlaid 
on the autotrophic loop pathways from Fig. [7] Right panel gives a projection of the serine cycle and glyoxylate regeneration 
cycle showing pathway directions; overlaps with the predecessor autotrophic pathways are labeled. 



The full substrate network of the most complex assim- 
ilatory pathway of methylotrophy is a bicycle in which 
the serine cycle is coupled to the glyoxylate regeneration 
cycle. This full network employs segments of all four 



loop-autotrophic pathways, as well as reactions in gly- 
colysis, and part of the "glycine cycle" . Carbon enters 
the pathway at several points. Methylene groups enter 
through the glycine cycle, combining with glycine to form 
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serine. Serine is then deaminated and reduced to pyru- 
vate, which is combined with a CO2 in a carboxylation 
to enter the core of TCA reactions. TCA arcs are per- 
formed reductively from pyruvate to malate, and oxida- 
tively from succinate to malate, following the pattern of 
the 3HP pathway plus anaplerotic reactions from its out- 
put pyruvate. The short-molecule arc of 3HP is run as 
in the autotrophic carbon-fixation pathway starting from 
propionate, but part of the long- molecule arc of 3HP is 
reversed in the glyoxylate regeneration cycle. The 4HB 
pathway arc, transferred from archaea, is also reversed to 
feed this glyoxylate cycle, and is followed by a final ad- 
ditional carboxylation unique to this pathway [1591 1160j . 

The serine/gyoxylate cycle of methylotrophy is a 
remarkable "Frankenstein's monster" of metabolism, 
stitched together from parts of all pre-existing pathways, 
but requiring almost nothing new in its own local chem- 
istry. Notably, the modules in this bacterial pathway 
which have been inherited from archaea are all reversed 
from the archaeal direction. 



F. Summary: Catalytic control as a central source 
of modularity in metabolism 

Focusing on the metabolic foundation of the biosphere 
- carbon-fixation and its interface with anabolism - we 
have seen many examples of how catalytic control is a 
central organizing principle in metabolism. The most 
complex and conserved reaction mechanisms in carbon- 
fixation often have unique (often very elaborate) metal 
centers and cofactors associated to them, reflecting the 
difficulty of the catalytic problem being solved. Not sur- 
prisingly, these reactions form the boundaries at which 
the various modules making up carbon-fixation are con- 
nected. As a result, these module boundaries form some 
of the strongest long-term constraints on evolution. They 
act as "turnstiles" along which the flow of carbon into 
the biosphere is redirected upon biogeochemical pertur- 
bations, resulting in the deepest structure in the tree of 
life. 

The catalytic control of classes of organic reactions also 
leads to a secondary source of modularity, the locking 
in of various core pathways in the elaboration of down- 
stream intermediary metabolism. The most striking ex- 
ample of this is that across the modern biosphere all an- 
abolic pathways originate in only a very small number 
of molecules, mostly within the TCA cycle, even though 
a variety of different carbon-fixation strategies are used. 
The suggested interpretation is that much of intermedi- 
ary metabolism had elaborated prior to the divergences 
in carbon-fixation. A related, but slightly different form 
of lock-in is found in the construction of methylotrophic 
pathways, which circumvents innovations in the catalytic 
control of difficult chemistry by re-using a wide range of 
parts from pre-existing carbon-fixation pathways. 



IV. COFACTORS, AND THE EMERGENCE 
AND CENTRALIZATION OF METABOLIC 
CONTROL 

Cofactors form a unique and essential class of compo- 
nents within biochemistry, both as individual molecules 
and as a distinctive level in the control over metabolism. 
In synthesis and structure they tend to be among the 
most complex of the metabolites, and unlike amino acids, 
nucleotides, sugars and lipids, they are not primary 
structural elements of the macromolecular components 
of cells. Instead, cofactors provide a limited but essen- 
tial inventory of functions, which are used widely and in 
a variety of macromolecular contexts. As a result they 
often have the highest connectivity (forming topological 
"hubs") within metabolic networks, and are required in 
conjunction with key inputs or enzymes |161H163] to com- 
plete the most elaborate metabolisms. 

In this section we will discuss how cofactors determine 
and regulate the scope of organic reactions in biochem- 
istry, and how as focal points of selection they have been 
important in the large-scale structure of evolution. In 
understanding the role of cofactors in the emergence and 
evolution of metabolism, two consequences of their func- 
tional roles are essential to acknowledge. First, as we 
have discussed, cofactor functions are central in going 
from the short-loop network autocatalysis that would 
have been abiotically favored with proper mineral sup- 
ports, to the long-loop network autocatalysis upon which 
all life today rests. As we will see, the most struc- 
turally complex cofactors are associated with the most 
catalytically complex functions within carbon-fixation, 
and thus form the most elaborate long-loop feedback clo- 
sures at the substrate level. Second, because cofactor 
functions are associated with kinetic bottlenecks within 
metabolism, their inventory of functions form strong 
long-term constraints on the evolution of new pathways, 
so innovations in cofactor synthesis can have dramatic 
effects on the large-scale structure of evolution. 



A. Introduction to cofactors as a group, and why 
they define an essential layer in the control of 
metabolism 

1. Cofactors as a class in extant biochemistry 

The biosynthesis of cofactors involves some of the most 
elaborate and least understood organic chemistry used by 
organisms. The pathways leading to several major cofac- 
tors have only very recently been elucidated or remain to 
be fully described, and their study continues to lead to 
the discovery of novel reaction mechanisms and enzymes 
that are unique to cofactor synthesis |164H166j . While 
cofactor biosynthetic pathways often branch from core 
metabolic pathways, their novel reactions may produce 
special bonds and molecular structures not found else- 
where in metabolism. These novel bonds and structures 
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are generally central in their catalytic functions. 

Structurally, many cofactors form a class in transition 
between the core metabolites and the oligomers. They 
contain some of the largest directly-assembled organic 
monomers (pterins, flavins, thiamin, tetrapyrroles) , but 
many also show the beginnings of polymerization of stan- 
dard amino acids, lipids or ribonucleotides. These may 
be joined by the same phosphate ester bonds that link 
RNA oligomers or aminoacyl-tRNA, or they may use dis- 
tinctive bonds (e.g. 5'-5' esters) found only in the cofactor 
class |167j . 

The polymerization exhibited within cofactors is dis- 
tinguished from that of oligomers by its heterogene- 
ity. Srinivasan and Morowitz [40 have termed them 
"chimcromers" , because they often include monomeric 
components from several molecule classes. Examples 
are coenzyme-A, which includes several peptide units 
and an ATP; folates, which join a pterin moiety to 
para-aminobenzoic acid (PABA); quinones, which join a 
PABA derivative to an isoprene lipid tail; and a variety 
of cofactors assembled on phosphoribosyl-pyrophosphate 
(PRPP) to which RNA "handles" are esterified. 

We may understand the border between small and 
large molecules, where most cofactors are found, as more 
fundamentally a border between the use of heteroge- 
neous organic chemistry to encode biological informa- 
tion in covalent structures, and the transition to homoge- 
neous phosphate chemistry, with information carried in 
sequences or higher-order non-covalent structures. The 
chemistry of the metabolic substrate is mostly the chem- 
istry of organic reactions. Phosphates and thioesters may 
appear in intermediates, but their role generally is to pro- 
vide energy for leaving groups, enabling formation of the 
main structural bonds among C, N, O, and H. One of the 
striking characteristics scales in metabolism is that its 
organic reactions, the near-universal mode of construc- 
tion for molecules of 20 to 30 carbons or less, cease to 
be used in the synthesis of larger molecules. 35 Large 
oligomeric macromolecules are almost entirely synthe- 
sized using the dehydration potential of phosphates [170] 
to link monomers drawn from the inventory |39| of small 
core metabolites. Many cofactors have structure of both 
kinds, and they are the smallest molecules that as a class 
commonly use phosphate esters as permanent elements 
of structure |171j . 

Finally, cofactors are distinguished by structure- 
function relations determined mostly at the single- 
molecule scale. The monomers that are incorporated into 
macromolecules are often distinguished by general prop- 
erties, and only take on more specific functional roles 
that depend strongly on location and context |172) I173j . 
In contrast, the functions of cofactors are specific, of- 



Even siderophores, among the most complex of widely-used or- 
ganic compounds, are often elaborations of functional centers 
that are small core metabolites, such as citrate 168 , 169 . 



ten finely tuned by evolution [55], and deployable in a 
wide range of macromolecular contexts. Usually they 
are carriers or transfer agents of functional groups or rc- 
ductants in intermediary metabolism [174] . Nearly half 
of enzymes require cofactors as coenzymes [1711 1174) . 
If we extend this grouping to include chelated met- 
als [1751 1176) and clusters, ranging from common iron- 
sulfur centers to the elaborate metal centers of gas- 
handling enzymes [501 H20) , more than half of enzymes 
require coenzymes or metals in the active site. 

The universal reactions of intermediary metabolism 
depend on only about 30 cofactors |174j (though this 
number depends on the specific definition used). Major 
functional roles include 1) transition- metal-mediated re- 
dox reactions (heme, cobalamin, the Nickel tetrapyrrole 
F430, chlorophylls 36 ), 2) transport of one-carbon groups 
that range in redox state from oxidized (biotin for car- 
boxyl groups, methanofurans for formyl groups) to re- 
duced (lipoic acid for methylene groups, S-adenosyl me- 
thionine, cocnzyme-M and cobalamin for methyl groups), 
with some cofactors spanning this range and mediat- 
ing interconversion of oxidation states (the folate family 
interconverting formyl to methyl groups), 3) transport 
of amino groups (pyridoxal phosphate, glutamate, glu- 
tamine), 4) reductants (nicotinamide cofactors, flavins, 
deazaflavins, lipoic acid, and coenzymc-B), 5) membrane 
electron transport and temporary storage (quinones), 6) 
transport of more complex units such acyl and amino- 
acyl groups (panthetheine in CoA and in the acyl-carrier 
protein (ACP), lipoic acid, thiamine pyrophosphate), 7) 
transport of dehydration potential from phosphate es- 
ters (nucleoside di- and tri-phosphates) , and 8) sources of 
thioester bonds for substrate-level phosphorylation and 
other reactions (panthetheine in CoA). 



2. Roles as controllers, and consequences for the emergence 
and early evolution of life 

Cofactors fill roles in network or molecular catalysis be- 
low the level of enzymes, but they share with all catalysts 
the property that they are not consumed by participat- 
ing in reactions, and therefore are key loci of control over 
metabolism. Cofactors as transfer agents are essential 
to completing many network-catalytic loops. In associ- 
ation with enzymes, they can create channels 37 and ac- 



It is natural in many respects to include Ferredoxins (and related 
flavodoxins) in this list. Although not cofactors by the criteria of 
size and biosynthetic complexity, these small, widely-diversified, 
ancient, and general-purpose Fe2S2, Fe3S4, and Fe4S4-binding 
polypeptides are unique low-potential (high-energy) electron 
donors. Reduced Ferredoxins are often generated in reactions 
involving radical intermediates in iron-sulfur enzymes, described 
below in connection with electron bifurcation. 
An example is the role of cobalamin as a Ci transfer agent to 
the Nickel reaction center in the acetyl-CoA synthase from a 
corrinoid iron-sulfur protein |177H179) . 
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tive sites 38 , and thus they facilitate molecular catalysis. 
Through the limits in their own functions or in the func- 
tional groups they transport through networks, they may 
impose constraints on chemical diversity or create bottle- 
necks to evolutionary innovation. The previous sections 
have shown that many module boundaries in carbon fix- 
ation and core metabolism are defined by idiosyncratic 
reactions, and we have noted that many of these idiosyn- 
crasies are associated with specific cofactor functions. 

Cofactors, as topological hubs, and participants in re- 
actions at high-flux boundaries in core and intermediary 
metabolism, are focal points of natural selection. The 
adaptations available to key atoms and bonds include 
altering charge or pKa, changing energy level spacing 
through non-local electron transport, or altering orbital 
geometry through ring strains. Divergences in low-level 
cofactor chemistry may alter the distribution of func- 
tional groups and thereby change the global topology of 
metabolic networks, 39 and some of these changes map 
onto deep lineage divergences in the tree of life. 

Most research on the origin of life has focused ei- 
ther on the metabolic substrate [5J 1180j or catalysis by 
RNA |181j . but we believe the priority of cofactors de- 
serves (and is beginning to receive) greater consider- 
ation [182, 183 . In the expansion of metabolic sub- 
strates from inorganic inputs, the pathways to produce 
even such complex cofactors as folates et alia are com- 
parable in position and complexity to those for purine 
RNA, while some for functional groups such as nicoti- 
namide |182) or chorismate are considerably simpler. 
Therefore, even though it is not known what catalytic 
support or memory mechanisms enabled the initial elab- 
oration of metabolism, any solutions to this problem 
should also support the early emergence of at least the 
major redox and C- and N-transfer cofactors. Con- 

38 An example is the role of TPP as the reaction center in the 
pyruvate-ferredoxin oxidoreductase (PFOR), which lies at the 
end of a long electron-transport channel formed by Fe-S clus- 
ters [84]. 

39 A well-understood example is the repartitioning of Ci flux from 
mcthanopterins versus folates 41 65 . The same adaptation 



versely, the pervasive dependence of biosynthetic reac- 
tions on cofactor intermediates makes the expansion of 
protometabolic networks most plausible if it was sup- 
ported by contemporaneous emergence and elaboration 
of cofactor groups. In this interpretation cofactors oc- 
cupy an intermediate position in chemistry and com- 
plexity, between the small-metabolite and oligomer lev- 
els |182j . They were the transitional phase when the re- 
action mechanisms of core metabolism came under selec- 
tion and control of organic as opposed to mineral-based 
chemistry, and they provided the structured foundation 
from which the oligomer world grew. 

We argue next that a few properties of the elements 
have governed both functional diversification and evolu- 
tionary optimization of many cofactors, especially those 
associated with core carbon-fixation. We focus in par- 
ticular on heterocycles with conjugated double bonds in- 
corporating nitrogen, and on the groups of functions that 
exploit special properties of bonds to sulfur atoms. 



B. The cofactors derived from purine RNA 

Most of the cofactors that use heterocycles for their 
primary functions have biosynthetic reactions closely re- 
lated to those for purine RNA. These reactions are per- 
formed by a diverse class of cyclohydrolase enzymes, 
which are responsible for the key ring-formation and ring- 
rearrangement steps. The cyclohydrolases can split and 
reform the ribosyl ring in PRPP, jointly with the 5- and 
6-membered rings of guanine and adenine. Five biosyn- 
thetically related cofactor groups are formed in this way. 
Four of these - the folates, flavins, deazaflavins and thi- 
amin - are formed from GTP, as shown in Fig. |14[ 

that enables formylation of methanoptcrins within an exclu- 
sively thioester system, where the homologous folate reaction 
requires ATP, reduces the potential for mcthylene-group trans- 
fer, and necessitates the oxidative formation of serine from 3PG 
in methanogens, which is not required of acetogens. 



Folates: The folates are structurally most similar to 
GTP, but have undergone the widest range of secondary 
specializations, particularly in the Archaea. They are 
primarily responsible for binding Ci groups during re- 
duction from formyl to methylene or methyl oxidation 
states, and their secondary diversifications are apparently 
results of selection to tune the free-energy landscape of 
these oxidation states. 

Flavins and deazaflavins: The flavins are tricyclic 
compounds formed by condensation of two pterin groups, 
while deazaflavins are synthesized through a modified 
version of this pathway, in which one pterin group 
is replaced by a benzene ring derived from choris- 



mate. Flavins are general-purpose reductants, while 
deazaflavins are specifically associated with methanogen- 
esis. 

Thiamin: Thiamine combines a C-N heterocycle com- 
mon to the GTP-derived cofactors with a thiazole group 
(so incorporating sulfur) , and shares functions with both 
the purine cofactor group and the alkyl-thiol group re- 
viewed in the next subsection. 

Histidine: The last "cofactor" in this group is the amino 
acid histidine, synthesized from ATP rather than GTP 
but using similar reactions. Histidine is a general acid- 
base catalyst with unique pKa, which in many ways func- 
tions as a "cofactor in amino acid form" [40 . 
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FIG. 14: Key molecular re-arrangements in the network leading from AIR to purines and the purine-derived cofactors. The 
3.5.4 class of cyclohydrolases (red) convert FAICAR to IMP (precursor to purines), and subsequently convert GTP to folates 
and flavins by opening the imidazole ring. Acting on the 6-member ring of ATP and on a second attached PRPP, the enzyme 
3.5.4.19 initiates the pathway to histidinol. The thiamine pathway, which uses the unclassified enzyme ThiC to hydrolize 
imidazole and ribosyl moieties, is the most complex, involving multiple group rearrangements (indicated by colored atoms). 
This complexity, together with the subsequent attachment of a thiazole group, lead us to place thiamine latest in evolutionary 
origin among these cofactors. 



We will first describe in detail the remarkable role of 
the folate group in the evolutionary diversification of the 
Wood-Ljungdahl pathway, and then return to general 
patterns found among the purine-derived cofactors, and 
their placement within the elaboration of metabolism and 
RNA chemistry. 



1. Folates and the central superhighway of C\ metabolism 

Members of the folate family carry Ci groups bound 
to either the N 5 nitrogen of a heterocycle derived from 
GTP, an exocyclic N 10 nitrogen derived from a para- 
aminobenzoic acid (PABA), or both. The two most 
common folates are tetrahydrofolate (THF), ubiqui- 
tous in bacteria and common in many archaeal groups, 
and tctrahydromethanopterin (H 4 MPT), essential for 
methanogens and found in a small number of late- 
branching bacterial clades. Other members of this family 
are exclusive to the archaeal domain and are structural 
intermediates between THF and H 4 MPT. Two kinds of 
structural variation are found among folates, as shown 
First, only THF retains the carbonyl group 

10 



in Fig. 15 



of PABA, which shifts electron density away from N 
via the benzene ring, and lowers its pKa relative to N 5 
of the heterocycle. All other members of the family lack 
this carbonyl. Second, all folates besides THF incorpo- 
rate one or two methyl groups that impede rotation be- 
tween the pteridine and aryl-amine planes, changing the 
relative entropies of formation among different binding 



states for the attached Ci [% H l65 l fl84] . 

Folates mediate a diverse array of Ci chemistry, var- 
ious parts of which are essential in the biosynthesis of 
all organisms |65j . The collection of reactions, summa- 
rized in Fig. [4j has been termed the "central superhigh- 
way" of one-carbon metabolism. Functional groups sup- 
plied by folate chemistry, connected by interconversion 
of Ci-oxidation states along the superhighway, include 
1) formyl groups for synthesis of purines, formyl-tRNA, 
and formylation of methionine (fMet) during transla- 
tion, 2) methylene groups to form thymidilate, which are 
also used in many deep-branching organisms to synthe- 
size glycine and serine, forming the ancestral pathway to 
these amino acids |41) . and 3) methyl groups which may 
be transferred to S-adenosyl-methionine (SAM) as a gen- 
eral methyl donor in anabolism, to the acetyl-CoA syn- 
thase to form acetyl-CoA in the Wood-Ljungdahl path- 
way, or to coenzyme-M where the conversion to methane 
is the last step in the energy system of methanogenesis. 



The variations among folates, shown in Fig. 15 leave 
the charge, pKa and resulting C-N bond energy at N 5 
roughly unaffected, while the the N 10 charge, pKa, and 
C-N bond energy change significantly across the fam- 
ily. This charge effect, together with entropic effects 
due to steric hindrance from methyl groups, can sharply 
vary the functional roles that different folates play in an- 
abolism. 

The biggest difference lies between THF and H 4 MPT. 
In THF, the N 10 pKa is as much as 6.0 natural-log units 
lower than that of N 5 |185j . The resulting higher-energy 



29 



Synthesis 




Structural variation 



Pyrococcus/Thermococcus 



J 1-2 



A = Archaea 
B = Bacteria 



FIG. 15: Structural variants among colactors in the folate 
family, shown with the biosynthetic pathways that produce 
these variations. Pteridine and benzene groups shown in blue, 
and methyl groups that regulate steric hindrance shown in 
red. 



The initial free energy to attach formate to methanofuran 
is provided by the terminal methane released in methano- 
genesis (the Co-M/Co-B cycle in Fig. [4]). The resulting 
downstream methylene group, however, has too little en- 
ergy as a leaving group to transfer to an alkyl-thiol cofac- 
tor, so methanogens sacrifice the ability to form glycine 
and serine by direct reduction of formate. 

The reconstructed ancestral use of the 7-9 reac- 
tions in Fig. [4] is to reduce formate to acetyl-CoA or 
methane. However, the reversibility of many reactions 
in the sequence, possibly requiring substitution of reduc- 
tant/oxidant cofactors, allows folates to accept and do- 
nate Ci groups in a variety of oxidation states, from and 
into many pathways including salvage pathways. Methy- 
lotrophic proteobacteria which have obtained H4MPT 
through horizontal gene transfer |156l 1157] may run the 
full reaction sequence in reverse. They may use ei- 
ther H4MPT to oxidize formaldehyde or THF to oxidize 
various methylated Ci compounds, in both cases lead- 
ing to formate, or other intermediary oxidation states 
(from THF) as inputs to anabolic pathways. In many 
late-branching bacteria, some archaea, and eukaryotes, 
the THF based pathway may run in part oxidatively 
and in part reductively, through connections to either 
gluconeogenesis/glycolysis or glyoxylate metabolism. In 
these organisms serine (derived through oxidation, ani- 
mation and dephosporylation from 3-phosphoglycerate) 
or glycine (derived through amination of glyoxylate) be- 
come the sources of transferable methyl groups in an- 
abolism. This versatility has preserved the folate path- 
way as an essential module of biosynthesis in all domains 
of life, and at the same time has made it a pivot of evo- 
lutionary variation. 



C-N bond cannot be formed without hydrolysis of one 
ATP, either to bind formate to N 10 of THF, or to cy- 
clize N 5 -formyl-THF to form N 5 ,N 10 -methenyl-THF (see 
Fig. [4]). 40 After further reduction, the resulting methy- 
lene is readily transferred to lipoic acid to form glycine 
and serine, in what we have termed the "glycine cy- 
cle" [H] (the lipoyl-protein based cycle on the right in 
Fig. g| ' 

In contrast, in H4MPT the difference in pKa between 
N 10 and N 5 is only 2.4 natural-log units. The lower 
C-N 10 bond energy permits spontaneous cyclization of 
N 5 -formyl-H 4 MPT, following (also ATP-independent) 
transfer of formate from a formyl-mcthanofuran cofac- 
tor. Through this sequence, methanogens fix formate in 
an ATP-independent system using only redox chemistry. 



This reaction is the mirror image of the cyclization of N -formyl- 
THF. This latter reaction is spontaneous. We will argue below 
that the alternative cyclization from N 5 -formyl-THF, previously 
only recognized as a salvage pathway 11861 . may reflect an un- 
recognized function of the cycloligase as an enzyme for ATP- 
dependent formate incorporation. 



2. Refinement of folate- Ci chemistry maps onto lineage 
divergence of methanogens 

The structural and functional variation within the 
folate family illustrates the way that selection, acting 
on cofactors, can create large-scale re-arrangements in 
metabolism, enabling adaptations that are reflected in 
lineage divergences. The free-energy cascade described in 
the last section, linking ATP hydrolysis, the charge and 
pKa of the N 10 nitrogen, and the leaving-group activity 
of the resulting bound carbon for transfer to alkyl-thiol 
cofactors or other anabolic pathways, is a fundamental 
long-range constraint of folate-Ci chemistry. A compar- 
ative analysis of gene profiles in pathways for glycine and 
serine synthesis, explained in Ref. |41j . shows that while 
the constraint cannot be overcome, its impact on the form 
of metabolism can vary widely depending on the struc- 
ture of the mediating folate cof actor. 

The annotated role for ATP hydrolysis in WL au- 
totrophs is to attach formate to N 10 of THF, initiating 
the reduction sequence. However, many deep-branching 
bacteria and archaea show no gene for this reaction, while 
multiple lines of evidence indicate that THF nonetheless 
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functions as a carbon-fixation cofactor in these organ- 
isms [41]. In almost all cases where an ATP-dependent 
N 10 -formyl-THF synthase is absent, an ATP-dependent 
N 5 -formyl-THF cycloligase [HI [W] is found. This is 
another case where a broad evolutionary context allows 
an alternate interpretation. N 5 -formyl-THF cycloligase 
was originally discovered in mammalian systems, where 
its function has been highly uncertain and hypothesized 
to be the salvage mechanism as part of a futile cy- 
cle |186[ I187j , before being found to be widespread across 
the tree of life [H] . If we deduce by reconstruction, how- 
ever, that ancestral folate chemistry operated in the fully 
reductive direction, and that in H4MPT systems formate 
is attached at the N 5 position, while in THF systems for- 
mate is attached at the N 10 position, the widespread dis- 
tribution of the cycloligase takes on a different meaning. 
It is plausible that the N -formyl-THF cycloligase allows 
a formate incorporation pathway that is an evolutionary 
intermediate between the commonly recognized pathway 
using THF and its evolutionary derivative using H4MPT 
(see Fig. [4|. The ATP-dependent cycloligase produces 
N 5 ,N 10 -methenyl-THF from N 5 -formyl-THF, which may 
potentially form spontaneously due to the higher N 5 - 
pKa |187j . ATP hydrolysis is thus specifically linked 
to the N 10 -carbon bond which is the primary donor for 
carbon groups from folates. Methanogens, in contrast, 
escape the dependence on ATP hydrolysis by decarboxy- 
lating PABA before it is linked to pteridine to form 



methanopterin (see Fig. 151, but they sacrifice methyl- 
group donation from H4MPT to most anabolic pathways, 
making methanogenesis viable only in clades that evolved 
the oxidative pathway to serine from 3-phosphoglycerate. 

We noted in Sec. lIII Dl that the elimination of one ATP- 
dependent acyl-CoA synthase in acetogens reduces the 
free energy cost of carbon fixation relative to rTCA cy- 
cling. The decoupling of the formate-fixation step on 
methanopterins from ATP hydrolysis is a further signif- 
icant innovation, lowering the ATP cost for uptake of 
C0 2 . This divergence of H4MPT from THF, and a re- 



lated divergence of deazaflavins from flavins (see Fig. 16 



follow phylogenetically (and we believe, were responsib 
for) the divergence of the methanogens from other eur- 
yarcheota |4"T] . 

We regard this example as representative of the way 
that innovations in cofactor chemistry more generally 
mediated large-scale rearrangements in metabolism, and 
corresponding evolutionary (and ecological) divergences 
of clades. Another similar example comes from the 
quinones, a diverse family of cofactors mediating mem- 
brane electron transport [188] . Ref. |189) found that the 
synthetic divergence of mena- and ubiquinone follows 
the pattern of phylogenetic diversification within pro- 
teobacteria. S- and e-proteobacteria use menaquinonc, 
7-proteobacteria use both mena- and ubiquinone, and 
a- and /3-proteobacteria use only ubiquinone. Because 
mena- and ubiquinone have different midpoint potentials, 
it was suggested that their distribution reflects changes 
in environmental redox state as the proteobacteria diver- 



sified during the rise of oxygen (1891 1190] . Such phylo- 
genetic divergences may alternatively be thought of as 
divergences driven by the closure of more advantageous 
long-loop feedback cycles. 



3. Relation of the organic superhighway to minerals 

A very wide range of circumstantial arguments has 
been made for the emergence of biochemistry from 
the reduced-mineral/seawater chemistry of hydrothermal 
vents. These include: detailed accounts of the capac- 
ity of a range of geochemical energy systems to to sup- 
port extant life [15J HH] > 41 detailed similarities between 
transition-metal/sulfide mineral unit cells and metallo- 
enzyme active sites [68l 11191 1191] . the widespread use 
of radical mechanisms in assembly of metal-center en- 
zymes |120j . and the more general presence of chelated 
metals in ubiquitous and conserved cofactors and en- 
zymes (particularly tetrapyrroles and ferredoxins), the 
richness of vent environments in geometry, surface catal- 
ysis [SSI 11921 1193j . thermal and pH gradients, and the 
greater similarity of the aqueous redox environment of 
hydrothermal fluids to biochemistry, than of atmospheric 
free-radical chemistry or the quenched ion chemistry in 
the interstellar medium p71 IB6l[TMlll95j . While these ar- 
guments still leave too many circumstantial steps to have 
created consensus that metabolism emerged through self- 
organization from geochemistry |154j . among the many 
speculations about what was necessary for the first 
metabolism, the geochemical hypothesis is grounded in 
the widest array of relevant empirical evidence. The geo- 
chemical hypothesis has also been circumstantially sup- 
ported by experimental evidence that minerals can cat- 
alyze reactions in the citric-acid cycle |71j , and an exten- 
sive range of reductions |196[ I197j , including synthesis of 
acetyl-thioesters [55] . 

The distinctive features of biochemical Ci reduction 
are the attachment of formate to tuned heterocyclic or 
aryl-amine nitrogen atoms for reduction, and the transfer 
of reduced Ci groups to sulfhydryl groups (of SAM, lipoic 
acid, or CoM). In the mineral-origin hypothesis for direct 
reduction, the Ci were adsorbed at metals and either re- 
duced through crystal oxidation [193J or by reductant in 
solution. The transfer of reduced Ci groups to alkyl-thiol 
cofactors may show continuity with reduction on metal- 
sulfide minerals. However, the mediation of reduction by 



41 A subset of the entries in Table 1 of lie:'. 25 . involving Fe 2 + re- 
duction or autotrophic methanogenesis, can be applied directly 
to early-earth environments. Many entries in their table of envi- 
ronments involve sulfates, nitrates, ferric iron, or small amount 
of molecular oxygen (the Knallgas reaction) as terminal electron 
acceptors. The organic conversions detailed in the paper remain 
a basis for habitability analysis, but plausible pathways in the 
Hadean will be limited by the alternative terminal electron ac- 
ceptors. 
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nitrogens appears a distinctively biochemical innovation. 



4- Cyclohydrolases as the central enzymes in the family, 
and the resulting structural homologies among cofactors 

The common reaction mechanism unifying the purine- 
derived cofactors is an initial hydrolysis of both 
purine and ribose rings performed by cyclohydro- 
lases assigned EC numbers 3.5.4 (see Fig. 14). 
These enzymes are responsible for the synthesis of 
inosine-monophosphate (IMP, precursor to AMP and 
GMP) from 5-formamidoimidazole-4-carboxamide ri- 
bonucleotide (FAICAR), for the first committed steps in 
the syntheses of both folates and flavins from GTP, and 
for the initial ring-opening step in the synthesis of Histi- 
dine from ATP and PRPP. Fig. [14] shows the key steps 
in the network synthesizing both purines and the pterins, 
folates, flavins, thiamine, and histidine. 

The common function of the 3.5.4 cyclohydrolases is 
hydrolysis of rings on adjacent nucleobase and ribose 
groups, or the formation of cycles by ligation of ring 
fragments. In all cases, the ribosyl moieties come from 
phosphoribosyl- pyrophosphate (PRPP). In the synthesis 
of pterins from GTP and of histidinol from ATP, both a 
nucleobase cycle and a ribose are cleaved. In pterin syn- 
thesis, the imidazole of guanine and the purine ribose are 
cleaved. In histidine synthesis, the six-membered ring of 
adenine is cleaved (at a different bond than the one syn- 
thesized from FAICAR), and the ribose comes from a 
secondary PRPP. 

By far the most complex synthesis in this family is that 
of thiamin from aminoimidazole ribonucleotide (AIR). 
This sequence begins with an elaborate molecular rear- 
rangement, performed in a single step by the enzyme 
ThiC [166 42 . While this enzyme is unclassified, and its 
reaction mechanism incompletely understood, it shares 
apparent characteristics with members of the 3.5.4 cy- 
clohydrolases. As in the first committed steps in the 
synthesis of folates and flavins from GTP, both a ribose 
ring and a 5-member heterocycle are cleaved and sub- 
sequently (as in folate synthesis) recombined into a 6- 
member heterocycle. The complexity of this enzymatic 
mechanism makes a pre-enzymatic homologue to ThiC 
difficult to imagine, and suggests that thiamin is both of 
later origin, and more highly derived, than other cofac- 
tors in this family. This derived status is supported by 
the fact that the resulting functional role of thiamin is not 
performed on the pyrimidine ring itself, but rather on the 
thiazole ring to which it is attached, and which is likewise 
created in an elaborate synthetic sequence [166] . The re- 
actions involving TPP do not directly create bonds to the 



sulfur atom, but instead use the carbon between it and 
the positively charged nitrogen. It seems likely, however, 
that the sulfur indirectly contributes to the properties of 
that carbon, through some combination of electrostatic, 
resonance, or possibly ring-straining interactions. 

Fig. [16] shows the detailed substrate re-arrangement in 
the sub-network leading from GTP to methanoptcrins, 
folates, riboflavin, and the archaeal deazaflavin F420- In 
the pterin branch, both rings of neopterin are synthe- 
sized directly from GTP, and an aryl-amine originating 
in PABA provides the second essential nitrogen atom. 
PABA is either used directly (in folates) or decarboxy- 
lated with attachment of a PRPP (in methanopterins) to 
vary the pKa of the amine. In contrast, the flavin branch 
is characterized by the integration of either ribulose (in 
riboflavin) or chorismate (in F420) to form the internal 
rings. Two 6,7-Dimethyl-8-(D-ribityl)lumazine are con- 
densed to form riboflavin, whereas a single GTP with 
chorismate forms F420- 




•£00 jtOX 



42 Eukaryotes use an entirely different pathway, in which the 
pyrimidine is synthesized from histidine and pyridoxal-5- 
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FIG. 16: The substrate modifications leading from GTP to 
the four major cofactors H 4 MPT, THF, riboflavin (in FAD) 
and the archaeal homologue deazaflavin F42o- The branches 
indicating substrate diversification may also reflect an evolu- 
tionary lineage. 



The cyclohydrolase reactions are the innovation en- 
abling the biosynthesis of this whole family of cofactors, 
and importantly, of purine RNA itself. Except for TPP, 
the distinctions among purine-derived cofactors are mi- 
nor secondary modifications on a background structured 



32 



by PRPP and C-N heterocycles. Chorismate, precursor 
to PABA and the unique source of single benzene rings 
in biochemistry, is the only other developed sub-network 
within metabolism, besides purine synthesis, on which 
this family draws. Flexibility in the ways that choris- 
mate is modified to control electron density, and the way 
the benzene ring is combined with other heterocycles, 
contributes to the combinatorial elaboration within the 
family. 



5. Placing the members of the class within the network 
expansion of metabolism 

The following observations suggest to us that most of 
the purine-derived cofactors (possibly excepting thiamin) 
were available contemporaneously with monomer purine 
RNA. 

The current understanding of protein cyclohydrolases 
does not suggest other, simpler mechanisms by which 
similar reactions might first have been catalyzed. 
However, at whatever stage catalysts capable of inter- 
converting AIR, AICAR, FAICAR, and IMP first be- 
came available, there is no compelling reason to be- 
lieve that pteridines were not formed contemporaneously. 
If the chorismate pathway (which begins in the sugar- 
phosphate network) had also arisen by that time, there 
is no compelling reason to believe that folates and flavins 
were not likewise available. Particularly if the early cata- 
lysts were primitive, opening reaction mechanisms at the 
level of the first three EC numbers but not restricting 
molecular substrates, it would be difficult to argue that 
molecules generally resembling this cofactor class could 
have been reliably excluded from a monomer-purine RNA 
world. 44 

Conversely, the patterns that characterize current 
metabolism as a recursive network expansion [1611 1162] 
about inorganic inputs are most easily understood as a 
reflection of the organic-chemical possibilities opened by 
cofactors. Pterins, as donors of activated formyl groups, 
support (among other reactions) the synthesis of purines, 
forming a short autocatalytic loop. Similarly, flavins 
would have augmented redox reactions. Finally, it has 
long been recognized that acid/base catalysis is uniquely 
served by histidine, which has a pKa w 6.5 on the e- 
nitrogen, a property not found among any biological ri- 
bonucleotides (though possible for some substituted ade- 
nine derivatives) 202 . 

Within the class of GTP-derived cofactors, a sub- 
structure may perhaps be suggested: the dimer condensa- 



For some reactions, the abstraction of enzyme mechanism is ad- 
vanced enough to identify small-molecule organocatalysts that 
could have provided similar functions 199. 1200] . 
Whether the first RNA were produced in this way, or through 
structurally very dissimilar stages, is a currently active ques- 
tion [20T] . 



tion that forms riboflavin is a hierarchical use of building 
blocks formed from GTP. Although simple and consisting 
of a single key reaction, this could reflect a later stage of 
refinement. It is recognized |203j that flavins are some- 
what specialized reductants, both biosynthctically and 
functionally more specific than the much simpler nicoti- 
namide cofactors, which plausibly preceded them [182 . 



6. Purine-derived cofactors selected before RNA itself, as 
opposed to having descended from an RNA world defined 
through base pairing? 

The overlap between RNA and cofactor biosynthesis, 
and the incorporation of AMP in several cofactors (where 
is serves primarily as a "handle" for docking), has been 
noticed and given the interpretation that cofactors are a 
degenerated relic of an oligomer RNA world |17l! . The 
only significant logical motivation to place oligomer RNA 
prior to small-molecule cofactors (which are of compara- 
ble complexity to monomer RNA) is a premise that the 
elaboration of biosynthesis required selected catalysts, 
and that RNA base pairing is the least-complex plausible 
mechanism supporting (specifically, Darwinian) selection 
and persistence of the required catalysts. 

This is a complex premise, as it requires not only 
organosynthesis of RNA, but also chiral selection and 
mechanisms to enable base pairing and (presumably 
template-directed) ligation [204] . 45 In comparison, small- 
molecule catalysis by either RNA [205] or related cofac- 
tors may be considered in any context that supports their 
synthesis. If chemical mechanisms are found which sup- 
port structured organosynthesis and selection - a require- 
ment for any metabolism-first theory of the origin of life 
- the default premise may favor simplicity: that hete- 
rocycles were first selected as cofactors, and that purine 
RNA, only one among many species maintained by the 
same generalized reactions, was subsequently selected for 
chirality, base-pairing, and ligation. 



A particular problem for RNA replication is the steric restric- 
tion to 3'-5' phosphate esters, over the kinetically favored 2'-5' 
linkage. 

The relative importance of synthesis and selection depends on 
whether opening access to a space of reactions, or concentrat- 
ing flux within a few channels in that space, is the primary 
limit on the emergence of order at each phase in the elabora- 
tion of metabolism. Following our earlier arguments about the 
need for autocatalysis, selection will be essential in some stages, 
and this remains an important problem for metabolism-first 
premises |154| . Chemical selection criteria derived from differen- 
tial growth rate pose no problem in the domain of small-molecule 
organocatalysis, but the identification of plausible mechanisms to 
preserve selected differences remains an important area of work. 
Most mechanisms that do not derive from RNA base pairing in- 
volve separation by spatial geometry or material phases, includ- 
ing porous- medium processes akin to invasion percolation 11531 . 
or more general proposals for compositional inheritance [2061 - 
208 , abstracted from models of coascervate chemistry. 
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C. The alkyl-thiol cofactors 

The major chemicals in this class include the sul- 
fonated alkane-thiols coenzyme-B (CoB) and coenzyme- 
M (CoM), cysteine and homocysteine including the 
activated forms S-adenosyl-homocysteine (which under 
methylation becomes SAM), lipoic acid, and pantetheine 
or pantothenic acid, including pantetheine-phosphatc. 
The common structure of the alkyl-thiol cofactors is an 
alkane chain terminated by one or more sulfhydryl (SH) 
groups. In all cases except lipoic acid, a single SH is 
bound to the terminal carbon; in lipoic acid two SH 
groups are bound at sub-adjacent carbons. Differences 
among the alkyl-thiol cofactors arise from their biosyn- 
thetic context, the length of their alkane chains, and per- 
haps foremost the functional groups that terminate the 
other ends of the chains. These may be as simple as sul- 
fones (in CoB) or as complex as peptide bonds (in CoA) . 

Cofactors in this class serve three primary functions, 
as reductants (cysteine, CoB, pantetheine, and one sulfur 
on lipoic acid), carriers of methyl groups (CoM, SAM, 
one sulfur on lipoic acid), and carriers of larger func- 
tional groups such as acyl groups (lipoic acid in lipoyl 
protein, phosphopantotheine in acyl-carrier protein). A 
highly specialized role in which H is a leaving group is the 
formation of thioesters at carboxyl groups (pantothenic 
acid in CoA, lipoic acid in lipoyl protein) This function 
is essential to substrate- level phosphorylation |209j . and 
appears repeatedly in the deepest and putatively oldest 
reactions in core metabolism. A final function closely re- 
lated to reduction is the formation and cleavage of S — S 
linkages by cysteine in response to redox state, which is a 
major controller of both committed and plastic tertiary 
structure in proteins. The sulfur atoms on cysteine often 
form coordinate bonds to metals in metallo-enzymes, a 
function that we may associate with protein ligands, in 
contrast to the more common nitrogen atoms that coor- 
dinate metals in pyrrole cofactors. 

The properties of the alkyl-thiol cofactors derive 
largely from the properties of sulfur, which is a "soft" 
period-3 element |210j that forms relatively unstable 
(usually termed "high-energy") bonds with the hard 
period-2 element carbon. For the alkyl-thiol cofactors 
in which sulfur plays direct chemical roles, three main 
bonds dictate their chemistry: S — C, S — S, and S — H. 
Sulfur can also exist in a wide range of oxidation states, 
and for this reason often plays an important role in en- 
ergy metabolism [211] , particularly for chemotrophs, and 
due to its versatility has been suggested to precede oxy- 
gen in photosynthesis [212] . The electronic versatility of 
sulfur and the high-energy C — S bonds combine with the 
large atomic radius of sulfur to give access to additional 
geometrical, electronic and ring-straining possibilities not 
available to CHON chemistry. 

Although not alkyl-thiol compounds as categorized 
above, two additional cofactors that make important in- 
direct use of sulfur are thiamin and biotin. In neither case 
is sulfur the element to which transferred Ci groups are 



bound, but its importance to the focal carbon or nitro- 
gen atom is suggested by the complexity of the chemistry 
and enzymes involved in its incorporation into these two 
cofactors [IM I2T3] . 



1. Biochemical roles and phylogenetic distribution 

Transfer of methyl or methylene groups: The S 

atoms of CoM, lipoic acid, and S-adenosyl-homocysteine 
accept methyl or methylene groups from the nitrogen 
atoms of pterins. Considering that transition-metal sul- 
fide minerals are the favored substrates for prebiotic 
direct-Ci reduction [551 11191 1197] . a question of partic- 
ular interest is how, in mineral scenarios for the emer- 
gence of carbon fixation, the distinctive relation between 
tuned nitrogen atoms in pterins as carbon carriers, and 
alkyl-thiol compounds as carbon acceptors, would have 
formed. 

Reductants and co-reductants: CoB and CoM act 

together as methyl carrier and reductant to form methane 
in methanogenesis. 47 A similar role as methylene carrier 
and reductant is performed by the two SH groups in lipoic 
acid. CoM is specific to mcthanogenic archaea |215j . 
while lipoic acid and S-adenosyl-homocysteine are found 
in all three domains 41, 216 . Lipoic acid is formed from 
octanoyl-CoA, emerging from the biotin-dependent mal- 
onate pathway to fatty acid synthesis, and along with 
fatty acid synthesis [55], may have been present in the 
universal common ancestor. The universal distribution 
of the glycine cycle supports this as noted earlier. 

Role in the reversal of citric-acid cycling: Lipoic 
acid becomes the electron acceptor in the oxidative de- 
carboxylation of a-ketoglutarate and pyruvate in the ox- 
idative Krebs cycle, replacing the role taken by reduced 
ferredoxin in the rTCA cycle. Thus the prior availability 
of lipoic acid was an enabling precondition for reversal of 
the cycle in response to the rise of oxygen. 

Carriers of acyl groups: Transport of acyl groups in 
the acyl-carrier protein (ACP) proceeds through thioes- 
terification with pantetheine phosphate, similar to the 
thioesterification in fixation pathways. In fatty acid 



In this complex transfer 11201 . the fully-reduced (Ni + ) state of 
the Nickel tetrapyrrole F430 forms a dative bond to — CH3 dis- 
placing the CoM carrier, effectively re-oxidizing F430 to Ni 3 +. 
Reduced F430 is regenerated through two sequential single- 
electron transfers. The first, from CoM-SH, generates a Ni 2+ 
state that releases methane, while forming a radical CoB' — S— S— 
CoM intermediate with CoB. The radical then donates the sec- 
ond electron, restoring Ni + . The strongly oxidizing hcterodisul- 
fide C0B-S-S-C0M is subsequently reduced with two NADH in 
a process known as electron bifurcation 12 141 (described further 
below), regenerating CoM-SH and CoB-SH while jointly gener- 
ating the low-potential reductant reduced-ferredoxin. Both the 
stepwise reduction of F430 and electron bifurcation illustrate the 
central role of metals as mediators of single-electron transfer pro- 
cesses in metabolism. 
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biosynthesis acyl groups are further processed while at- 
tached to the panthetheine phosphate prosthetic group. 

Electron bifurcation: The heterodisulfide bond of 
CoB-S-S-CoM has a high midpoint potential (E' a — 
-140mV), relative to the H+ /H 2 couple (E' = 
— 414mV), and its reduction is the source of free en- 
ergy for the endergonic production of reduced Ferre- 
doxin (Fd 2 ~, E' in situ unknown but between — 520mV 
and — 414mV) [214] . which in turn powers the initial 
uptake of CO2 on H 4 MPT in methanogens. The re- 
markable direct coupling of exergonic and endergonic 
redox reactions through splitting of binding pairs into 
pairs of radicals, which are then directed to paired high- 
potential/low-potential acceptors, is known as electron 
bifurcation Variant forms of bifurcation are com- 

ing to be recognized as a widely-used strategy of metal- 
center enzymes, either consuming oxidants as energy 
sources to generate uniquely biotic low-potential reduc- 
tants such as Fd 2 " [HI [5T7H2T35], or to "titrate" re- 
dox potential to minimize dissipation and achieve re- 
versibility of redox reactions involving reductants at di- 
verse potentials, e.g. by combining low-potential (Fd 2 ~, 



K 



-420mV) and high-potential (NADH, E' 



— 300mV) reductants to produce intermediate-potential 
reductants (NADPH, E' = -360mV) [220] , Together 
with substrate-level phosphorylation (SLP), electron bi- 
furcation may be the principle chemical mechanism (con- 
trasted with membrane-mediated oxidative phosphoryla- 
tion) for interconvcrting biological energy currencies, and 
along with SLP [209] , a mechanism of central importance 
in the origin of metabolism [221 . Small metabolites in- 
cluding such heterodisulfides of cofactors, which can form 
radical intermediates exchanging single electrons with Fc- 
S clusters (typically via flavins) are essential sources and 
repositories of free energy in pathways using bifurcation. 



2. Participation in carbon fixation pathway modules 

The similarity between the glycine cycle and methano- 
genesis in Fig. [^emphasizes the convergent roles of alkyl- 
thiol cofactors. In the glycine cycle, methylene groups 
are accepted by the terminal sulfur on lipoic acid, and 
the subadjacent SH serves as reductant when glycine is 
produced, leaving a disulfide bond in lipoic acid. The 
disulfide bond is subsequently reduced with NADH. In 
methanogenesis, a methyl group from H4MPT is trans- 
ferred to CoM, with the subsequent transfer to F430, and 
the release from F430 as methane in the methyl-CoM re- 
ductase, coupled to formation of CoB — S — S — CoM. The 
heterodisulfide is again reduced with NADH, but employs 
a pair of electron bifurcations to retain the excess free 
energy in the production of Fd 2_ rather than dissipat- 
ing it as heat [214J. Methanogenesis is thus associated 
with 7 distinctive cofactors beyond even the set known 
to have diversified functions within the archaea [5j , again 
suggesting the derived and highly optimized nature of 
this Euryarchaeal phenotype. The striking similarity of 



these two methyl-transfer systems, mediated by indepen- 
dently evolved and structurally quite different cofactors, 
suggests evolutionary convergence driven specifically by 
properties of alkyl thiols. 

A curious pattern, which we note but do not attempt 
to interpret, is the association of non-sulfur, nitrogen- 
heterocycle cofactors with WL carbon fixation, con- 
trasted with the use of sulfur-containing heterocycles in 
carboxylation reactions of the rTCA cycle. The non- 
sulfur cofactors THF and H4MPT are used in the re- 
actions of the WL pathway, while the biosynthetically- 
related but sulfur-containing cofactor Thiamin mediates 
the carbonyl insertion (at a thioester) in rTCA [531 1222j . 
Biotin - which has been generally associated with mal- 
onate synthesis in the fatty-acid pathway (and derivatives 
such as propionate carboxylation to methyl-malonate in 
3HP [55]) - mediates the subsequent /3-carboxylation of 
pyruvate and of a-ketoglutarate [86] [2231 [224] . Thus 
the two cofactors we have identified as using sulfur in- 
directly to tune properties of carbon or nitrogen Ci- 
bonding atoms mediate the two chemically quite different 
sequential carboxylations in rTCA. 



D. Carboxylation reactions in cofactor synthesis 

Carboxylation reactions can be classified as falling into 
two general categories: those used in core carbon "up- 
take" , and those used exclusively in the synthesis of spe- 
cific cofactors. In addition to carboxylation reactions in 
carbon-fixation pathways, the former category includes 
the carboxylation of crotonyl-CoA in the glyoxylate re- 
generation cycle. Although not an autotrophic pathway 
this reaction does form a distinct entry point for CO2 
into the biosphere. The carboxylation of acetyl-CoA to 
malonyl-CoA further serves a dual purpose, in being both 
the starting point for fatty acid synthesis, as well as a key 
step in the 3HP pathway used in several carbon-fixation 
pathways. All these carboxylation reactions thus have in 
common that they are used at least in some organism as 
the central source for cellular carbon. All other carboxy- 
lation reactions that are not used as part of core carbon 
uptake, are used in the synthesis of the biotin cofactor, 
and the purine and pyrimidine nucleotides (see Fig. 17 1. 

If we consider the sequences in which CO2 is incorpo- 
rated in these pathways, they also form a distinct class 
of chemistry. In all three cases the resulting carboxyl 
group is immediately animated, either as part of the 
carboxylation reaction, or in the following reaction, and 
the carboxamide group is subsequently maintained into 
the final heterocyclic structure. In addition we previ- 
ously saw that IMP becomes the source for the folate 
and flavin family (through GTP). Carboxylation reac- 
tions are thus cither a general source for cellular carbon 
in core metabolism, or a specific source of carboxamide 
groups in the synthesis of cofactors that are part of the 
catalytic control of core metabolism. 
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FIG. 17: Carboxylation reactions in the synthesis of cofac- 
tors. The sequences show the immediate amination of the 
carboxyl group to a carboxamide group, which is then pre- 
served into the final heterocyclic structure. As the only car- 
boxylations not used in core carbon uptake, these reaction 
sequences form a distinct class of chemistry. Amination re- 
actions are shown as net additions of ammonia, which may 
be derived from other sources (such as glutamine, aspartate 
or S-adenosyl-methionine). Abbreviations: Alanine (ALA); 
Aspartate (ASP); phosphoribosyl pyrophosphate (PRPP). 



E. The chorismate pathway in both amino acid and 
cofactor synthesis 

Chorismate is the sole source of single benzene rings 
in biochemistry 225 . The non-local 7r-bond resonance is 
used in a variety of charge-transfer and electron transfer 
and storage functions, in functional groups and cofac- 
tors derived from chorismate. We have noted the charge- 
transfer function of PABA in tuning N 10 of folates, and 
its impact on Ci chemistry. The para-oriented carbonyl 
groups of quinones may be converted to partially- or 
fully-resonant orbitals in the benzene ring, enabling fully 
oxidized (quinone), half-reduced (semiquinone) , or fully 
reduced (hydroquinone) states [203 . Finally, the aro- 
matic ring in tryptophan (a second amino acid which 
behaves in many ways like a cofactor) has at least one 
function in the active sites of enzymes as a mediator of 
non-local electron-transfers 



INNOVATION: PROMISCUOUS CATALYSIS, 
SERENDIPITOUS PATHWAYS 



The previous sections argued for the existence of 
low-level chemical and cofactor/catalyst constraints on 
metabolic innovations, and presented evolutionary diver- 
gences that either respected these as constraints, or were 
enabled by the diversification of cofactor and catalytic 
functions. In this section we consider the dynamics by 
which innovation occurs, and its main organizing prin- 
ciples. Innovation in modern metabolism occurs prin- 
cipally by duplication and divergence of enzyme func- 
tion |116l 12271 1228) . Often it relies on similarity of func- 



tions among paralogous enzymes, but in some cases may 
exploit more distant or accidental overlap of functions. 

Innovation always requires some degree of enzymatic 
promiscuity [116] . which may be the ability to catalyze 
more than one reaction (catalytic promiscuity) or to 
admit more than one substrate (substrate ambiguity). 
Pathway innovation also requires serendipity [229J , which 
refers to the co-incidence of new enzymatic function with 
some avenue for pathway completion that generates an 
advantageous phenotype from the new reaction. Most 
modern enzymes are highly specific, 48 but specific en- 
zymes - whether due to structure or due to evolved reg- 
ulation - are of necessity diversified in order to cover 
the broad range of metabolic reactions used in the mod- 
ern biosphere. Serendipitous pathways assembled from 
a diversified inventory of specific enzymes will in most 
cases be strongly historically contingent as they depend 
on either overlap of narrow affinity domains or on "ac- 
cidental" enzyme features not under selection from pre- 
existing functions. Such pathways therefore seem unpre- 
dictable from first principles; whether they are rare will 
depend on the degree to which the diversity of enzyme 
substrate-affinities compensates for their specificity. 

A key question for early metabolic evolution is whether 
the trade-off between specificity and diversity was differ- 
ent in the deep past than in the present, in either degree 
or in structure, in ways that affected either the discov- 
ery of pathway completions or the likelihood that new 
metabolites could be retained within existing networks. 
These structural aspects of promiscuity and serendipity 
determine the regulatory problem faced by evolution in 
balancing the elaboration of metabolism with its preser- 
vation and selection for function. 



1. Creating reaction mechanisms and restricting substrates, 
while evolving genes 



Metabolism is characterized at all levels by a ten- 
sion between creating reaction mechanisms that intro- 
duce new chemical possibilities, and then pruning those 
possibilities by selectively restricting reaction substrates. 
Whether this tension creates a difficult or an easy prob- 
lem for natural selection to solve depends at any time 
on whether the accessible changes in catalytic function, 
starting from integrated pathways, readily produce new 
integrated pathways whose metabolites can be recycled 
in autocatalytic loops. We argue that the conservation of 
pathway mechanisms, particularly when these are defined 
by generic functional groups such as carboxyls, ketones, 



48 However, broad substrate-specificity is no longer considered rare, 
and is even explained as an expected outcome in some cases 
where costs of refinement are higher than can be supported by 
natural selection, and in other cases by positive selection for 
phenotypic plasticity 12281 



3G 



and enols, with promiscuity coming from substrate am- 
biguity with respect to molecular properties away from 
the reacting functional group, favors the kind of orderly 
pathway duplication that we observe in the extant di- 
versity of core metabolism. Therefore we expect that 
serendipitous pathway formation was both facile in those 
instances in the early phases of metabolic evolution where 
innovations in radical-based mechanisms for carbon in- 
corporation occurred, and structured according to the 
same local-group chemistry around which the substrate 
network is organized |67j . 

Modern enzymes both create reaction mechanisms and 
restrict substrates, but the parts of their sequence and 
structure that are under selection for these two cate- 
gories of function may be quite different, so the two func- 
tions can evolve to a considerable degree independently. 
Active-site mechanisms in enzymes for organic reactions 
will often depend sensitively on a small number of highly 
conserved catalytic residues in a relatively fixed geome- 
try, while substrate selection can depend on a wide range 
of properties of enzyme shape or conformation dynam- 
ics |228j , on local functional-group properties of the sub- 
strate that have been termed "chemophores" [230] . as 
well as (in some cases) on detailed relations between the 
substrate and active-site geometry or residues. An ex- 
treme example of the potential for separability between 
reaction mechanism and substrate selection is found in 
the polymerases. A stereotypical reaction mechanism 
of attack on activating phosphoryl groups requires lit- 
tle more than correct positioning of the substrates. In 
the case of DNA polymerases, at least six known cat- 
egories (A, B, C, D, X, and Y) with apparently inde- 
pendent sequence origin have converged on a geometry 
likened to a "right hand" which provides the required 
orientation [2"3"Tl |2"3"2"] . 

At the same time as evolving enzymes needed to pro- 
vide solutions to the biosynthetic problem of enabling 
and regulating metabolic network expansion, they were 
themselves dependent on the evolving capabilities of ge- 
nomic and translation systems for maintaining complex- 
ity and diversity. Jensen |227j originally argued that high 
enzymatic specificity was no more plausible in primitive 
cells than highly diversified functionality, 49 and that en- 
zymatic promiscuity was both evolutionarily necessary 
and consistent with what was currently known about 



This argument was largely a rebuttal of an earlier proposal by 
Horowitz 233 for "retrograde evolution" of enzyme functions. 
The 1940s witnessed the rise of an overly-narrow interpretation 
of "one gene, one enzyme, one substrate, one reaction" (a rigid 
codification of what would become Crick's Central Dogma I234| . 
which in the context of complex pathway evolution appeared to 
be incompatible with natural selection for function of interme- 
diate states. The Horowitz solution was to depend on an all- 
inclusive "primordial soup" 50 , in which pathways could grow 
backward from their final products, propagating selection step- 
wise downward in the pathway until a pre-existing metabolite or 
inorganic input was found as a pathway origin.) 



substrate ambiguity and catalytic promiscuity. Mod- 
ern reviews [116 . 228, 230] of the mechanisms underlying 
functional diversity, promiscuity, and serendipity confirm 
that substrate ambiguity is the primary source of promis- 
cuity that has led to the diversification of enzyme fam- 
ilies. It is striking that, even in cases where substrate 
affinity has been the conserved property while alternate 
reaction mechanisms or even alternate active sites have 
been exploited, it is often local functional groups on one 
or more substrates that appear to determine much of this 
affinity [2"2"gj . 

2. Evidence in our module substructure that early 
innovation was governed principally by local chemistry 

The substructure of modules, and the sequence of in- 
novations, we have sketched in Sec. |III| appears to be 
dominated by substrate ambiguity in enzymes or enzyme 
families with conserved reaction mechanisms. The key 
reactions in carbon fixation are of two types: Crucial re- 
actions typically involve metal centers or cofactors that 
could have antedated enzymes, and it is primarily re- 
action sites, not molecular selectivity, that distinguishes 
pathways at the stage of these reactions 50 . The shared 
internal sequence of reductions and isomerizations com- 
mon to modules (Fig. [8]) are very broadly duplicated, 
and the molecular specificity in their enzymes today is 
not correlated with significant reaction-sequence changes 
in the internal structure of pathways. These pathways 
could plausibly function much as they do today with less- 
specific hydrogenases and aconitases. 

A quantitative reconstruction of early evolutionary dy- 
namics will require merging probability models for net- 
works and metabolic phenotypes with those for sequences 
and structure of enzyme families. The goal is a consis- 
tent model of the temporal sequence of ancestral states of 
catalyst families, and of the substrate networks on which 
they acted. 

VI. INTEGRATION OF CELLULAR SYSTEMS 

The features of metabolism that display a "logic" of 
composition, which is then reflected in their evolutionary 
history, are those with few and robust responses to envi- 
ronmental conditions that can be inferred from present 
diversity. These are the subsystems whose evolution has 
been simplified and decoupled by modularity. Their rel- 
ative immunity from historical contingency, resulting in 
more "thermodynamic" modes of evolution, result from 



Recall that the enzymes that have been argued to be the ances- 
tral forms of both the acetyl- and succinyl-CoA ligases and the 
pyruvate and o-ketoglutarate biotin-carboxylases show very close 
sequence homology 86 , 89 , suggesting shared ancestral enzymes 
for both. 
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rapid, high-probability convergence in populations that 
can share innovations |141) . 

The larger roles for standing variation and historical 
contingency that are so often emphasized |235j in ac- 
counts of evolutionary dynamics are made possible by 
longer-range correlations that link modules, creating mu- 
tual dependencies and restricting viable changes [1081 
I109| . The most important source of such linkage in extant 
life is the unification of metabolic substrates and control 
processes within cells |236| . Cellular death or reproduc- 
tion couples fitness contributions from many metabolic- 
phenotype traits, together with genome replication sys- 
tems. This enables the accumulation of diversity as 
genomes capture and exploit gains from metabolic con- 
trol, complementary specialization |237j . and the emer- 
gence of ecological assemblies of specialists as significant 
mediators of contingent aspects of evolutionary innova- 
tion (as we illustrated with the example of methylotro- 
phy). 

We consider in this section several important ways in 
which aggregation of metabolic processes within cells fol- 
lows its own orderly hierarchy and progression. We note 
that even a single cell does not impose only one type of 
aggregation, but at least three types, and that these are 
the bases for different selection pressures and could have 
arisen at different times. Within cellular subsystems, the 
coupling of chemical processes is often mediated by cou- 
pling of their energy systems, which has probably de- 
veloped in stages we can identify. Finally, even where 
molecular replication is coupled to cellular physiology, in 
the genetic code, strong and perhaps surprising signa- 
tures of metabolic modularity are recapitulated. 



A. Cells provide at least three functionally distinct 
forms of compartmentation 

Under even the coarsest functional abstraction, the cell 
provides not one form of compartmentation, but at least 
three 238, 239 . The geometry and topology of closed 
spheres or shells, and the capacitance and proton imper- 
meability of lipid bilayers, permit the buildup of pH and 
voltage differences, and thus the coupling of redox and 
phosphate energy systems through intermediate proton- 
motive (or in many cases, sodium- motive) force |240j . 
The concentration of catalysts with substrates enhances 
reactions that are second-order in organic species, and 
the equally important homeostatic control of the cyto- 
plasm regulates metabolic reaction rates and precludes 
parasitic reactions. Finally, the cell couples genetic vari- 
ations to internal biochemical and physiological varia- 
tions 51 much more exclusively than they are coupled to 



The perspective that this is an active coupling, which defines one 
of the forms of individuality rather than providing a complete 
characterization of the nature of the living state, is supported by 



shared resources such as biofilms or siderophores, leading 
to the different evolutionary dynamics of development 
from niche construction [35]. 52 Each of these forms of 
coupling affects the function and evolution of the mod- 
ules we have discussed. 



1. Coupling of redox and phosphate energy systems may 
have been the first form of compartmentation selected 

Biochemical subsystems driven, respectively, by redox 
potential or phosphoanhydride-bond dehydration poten- 
tial, cannot usually be directly coupled to one another 
due to lack of "transducer" reactions that draw on both 
energy systems. 53 The notable exception to this rule is 
the exchange of phosphate and sulfur groups in substrate- 
level phosphorylation |203j from thioesters (which may 
proceed in either direction depending on conditions) . Al- 
though it provides a less flexible mode of coupling than 
membrane-mediated oxidative phosphorylation, this cru- 
cial reaction type, which occurs in some of the deepest 
reactions in biochemistry (those employing CoA, includ- 
ing all those in the six carbon fixation pathways), has 
been proposed as the earliest coupling of redox and phos- 
phate [209] , and the original source of phosphoanhydride 
potential |69j enabling pathways that require both reduc- 
tion and dehydration reactions. 

Phosphate concentration limits growth of many bio- 
logical systems today, and phosphate concentrations ap- 
pear to be even lower in vent fluids |244j than on av- 
erage in the ocean, making it difficult to account for 
the emergence of many metabolic steps in hydrothermal 
vent scenarios for the origin of life. Serpentinization and 
other rock-water interactions that produce copious re- 
ductants also scavenge phosphates into mineral form, so 
it appears doubtful that phosphates were abundant in 
the environments otherwise most favorable to geochem- 
ical organosynthesis. What little phosphate is found in 
water is primarily orthophosphate, because the phospho- 
anhydride bond is unstable to hydrolysis. Therefore the 
retention of orthophosphate, and the continuous regen- 
eration of pyrophosphate and polyphosphates |245H249] , 
may have been essential to the spread of early life beyond 
very rare geochemical environments. 

The membrane-bound ATP-synthetase, which cou- 
ples phosphorylation to a variety of redox reactions [5j 



the complex ecosystems including viral RNA and DNA that are 
partly autonomous of the physiology of particular cells 241 , 242 . 
For an argument that somatic development and niche construc- 
tion are variants on a common process, distinguished by the 
genome's level of control and exploitation of the constructed re- 
sources, see Ref. |243| . 

In addition to the ultimate physical constraint of limits to free 
energy, biochemistry operates under additional proximate con- 
straints not only from availability of free energy but from the 
chemical and quantum-mechanical substrates in which it is car- 
ried. 
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through proton or sodium pumping, is therefore essential 
in nearly all biosynthetic pathways, and must have been 
among the first functions of the integrated cell. Without 
a steady source of phosphate esters, none of the three 
oligomer families could exist. The ATP synthetase itself 
is homologous in all organisms, providing one strong ar- 
gument (among many [85, 250 ) for a membrane-bound 
last common ancestor. Proton-mediated phosphorylation 
(best known through oxidative phosphorylation in the 
respiratory chain [203] ) requires a topologically enclosed 
space to function as a proton capacitor |240] . However, 
as shown by gram-negative bacteria [5] and their de- 
scendants mitochondria and plastids, which acidify the 
periplasmic space or thylakoid lumen, the proton capaci- 
tor need not be (and generally is not) the same compart- 
ment as the cytoplasm containing enzymatic reactions. 
Because the coupling of energy systems is a different 
function from regulating reaction rates catalytically, the 
phosphorylation system should not generally have been 
subject to the same set of evolutionary pressures and 
constraints as other cellular compartments, and need not 
have arisen at the same time. We note that, because 
it may have lower osmotic pressure than the cytoplasm, 
the acidified space required for proton-driven phospho- 
rylation may not have required a cell wall, greatly sim- 
plifying the number of concurrent innovations required 
for compartmentalization, compared to those for the cy- 
toplasm. Therefore we conjecture that proton-mediated 
phosphorylation could have been the first function lead- 
ing to selection for lipid-bilayer compartmentalization, 
allowing other cellular functions to accrete at later times. 



2. Regulation of biosynthetic rates may have been 
prerequisite for the optimization of loop- autocatalytic cycles 

The second function of cellular compartments, and the 
one most emphasized in vesicle theories of the origin of 
life O 251, 252], is the enhancement of second-order re- 
actions by collocation of catalysts and their substrates. 
Here we note another role that we have not seen men- 
tioned, which is more closely related to the functions 
of the cell that inhibit reactions. Organisms employing 
autocatalytic-loop carbon fixation pathways must reli- 
ably limit their anabolic rates to avoid drawing off excess 
network catalysts into anabolism, resulting in passage be- 
low the autocatalytic threshold for self-maintenance, and 
collapse of carbon fixation and metabolism. Regulating 
anabolism to maintain viability and growth may have 
been an early function of cells. 

We noted in Sec. |IIID"4| thc fragility of autocatalytic- 
loop pathways to parasitic side-reactions, and the way 
the addition of a linear pathway such as WL stabilizes 
loop autocatalysis in the root node of Fig. 11 54 It may 



be that the optimizations in either branch of the carbon- 
fixation tree were not possible until rates of anabolism 
were sufficiently well-regulated to protect supplies of loop 
intermediates or essential cofactors. Therefore, while the 
root node is plausible as a pre-cellular [56] or an early 
cellular (but non-optimized) form, either branch from it 
may have required the greater control afforded by quite 
refined cellular regulation of reaction rates. 



B. Coupling of metabolism to molecular 
replication, and signatures of chemical regularity in 
the genetic code 

Among the subsystems coupled by modern cells, per- 
haps none is more elaborate than the combined appara- 
tus of amino acid and nucleotide biosynthesis and pro- 
tein coding. The most remarkable chemical aspect of the 
protein-coding system is that it is an informational sys- 
tem: a sophisticated machinery of transcription, tRNA 
formation and aminoacylation, and ribosomal translation 
separates the chemical properties of DNA and RNA from 
those of proteins, permitting almost free selection of se- 
quences in both alphabets in response to requirements of 
heredity and protein function. 55 The interface at which 
this separation occurs is the genetic code. From the in- 
formational suppression of chemical details that defines 
the coding system, the code itself might have been ex- 
pected to be a random map, but empirically the code is 
known to contain many very strong regularities related 
to amino acid biosynthesis and chemical properties, and 
perhaps to the evolutionary history of the aminoacyl- 
tRNA synthetases. 

Many explanations have been advanced for redundancy 
in the genetic code, as a source of robustness of protein 
properties against single-point mutations |141U253]1255| . 
but in all of these the source of selection originates in 
the elaborate and highly evolved function of coding it- 
self. In many cases the redundancy of amino acids at 
adjacent coding positions reflects chemical or structural 
similarities, consistent with this robustness-criterion for 
selection, but in nearly all cases redundancy of bases in 
the code correlates even more strongly with shared ele- 
ments of biosynthetic pathways for the amino acids. The 
co-evolutionary hypothesis of Wong [256] accounts for 
the correlation of the first base-position with amino-acid 
backbones as a consequence of duplication and diver- 



For proto-metabolism, spontaneous abiotic side-reactions may 



be hazardous, if catalysts in the main fixation pathway to not 
sufficiently accelerate their reaction rates, creating a separation 
of timescales relative to the uncatalyzed background. Within the 
first cells, the same hazard is posed by secondary anabolism, as 
its reaction rates become enhanced by catalysts similar to those 
in the core. This fact was clearly noted already in Ref. |56| . 
55 The observation that enzymes acting on DNA have evolved to ac- 
tively mitigate chemical differences in the bases, to enable a more 
nearly neutral combinatorial alphabet, is due to Peter Schuster 
(pers. comm.) 
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gence of amino acid biosynthetic enzymes together with 
aminoacyl-tRNA synthetases (aaRS). The stereochemi- 
cal hypothesis of Woese [257] addresses a correlation of 
the second coding position with a measure of hydropho- 
bicity called the polar requirement. The remarkable fact 
that both correlations are highly significant relative to 
random assignments, but that they are segregated be- 
tween first and second codon bases, is not specifically ad- 
dressed in either of these accounts. Copley et al. [205] ad- 
dress the same regularities as both the Wong and Woese 
hypotheses, but link them to much more striking redun- 
dancies in biosynthetic pathways, which they propose are 
consequences of small-molecule organo-catalytic roles of 
dimcr RNA in the earliest biosynthesis of amino acids. 

We note here a further chemical regularity in the ge- 
netic code, which falls outside the scope of the previous 
explanations: purines in second base-position code for 
several amino acids that either use related purine-derived 
cofactors in their biosynthetic pathways, or are directly 
related to the codon. This association is much more 
comprehensive for G-second codons than for A-second 
codons, and it does not suggest the same kinds of mech- 
anistic relations in the two cases. However, it further 
compresses the description of patterns in the code that 
were not addressed in Ref . [205] , in terms of similar chem- 
ical and biosynthetic associations. 

A strong correlation, of a single kind, is found between 
the glycine cycle for amino acid biosynthesis from Ci 
groups on folate cofactors, and codons XGX, where X 
is any base and G is guanine. (In what follows we ab- 
breviate wobble-base positions y for pyrimidines U and 
C, or u for purines A or G.) This group includes glycine 
(GGX), serine (AGy), cysteine (UGy), and tryptophan 
(UGu). 56 We do not propose a specific mechanism for 
such an association here, but our earlier argument that 
folates would have been contemporaneous with GTP sug- 
gests that biosynthesis through the glycine cycle was the 
important source of these amino acids at the time they 
became incorporated into the code. Some of these amino 
acids satisfy multiple regularities, as in the correlation 
of glycine with GXX ■<-» reductive transamination, or 
cysteine with UXX o pyruvate backbone, proposed in 
Ref. |2D5] . 

The position (CAy) of histidine, synthesized from 
ATP, is the only case we recognize of a related corre- 
lation in XAX codons. For this position, the availability 
of ATP seems to have been associated with the synthesis 
of histidine directly through the cyclohydrolase function 
(rather than through secondary cofactor functions), at 
the time this amino acid became incorporated into the 
code. 

Much more than correlation is required to impute cau- 
sation, and all existing theories of cause for regularities 



Both purines are used in the mitochondrial code and only UGG 
in the nuclear code. 



in the genetic code are either highly circumstantial or 
require additional experimental support. Therefore we 
limit the aspects of these observations that we regard as 
significant to the following three points: 
The existence of a compression: The idealized adap- 
tive function of coding is to give maximum evolution- 
ary plasticity to aspects of phenotype derived from pro- 
tein sequence, uncoupled from constraints of underlying 
biosynthesis. The near-wholesale transition from organic 
chemistry to polymer chemistry around the C20 scale sug- 
gests that this separation has been effectively maintained 
by evolution. Strong regularities which make the descrip- 
tion of the genetic code compressible relative to a random 
code reflect failures of this separation which have trans- 
mitted selection pressure across levels, during either the 
emergence or maintenance of the code. These include 
base-substitution errors, whether from mutation or in the 
transcription and translation processes, but also appar- 
ently chemical relations between nucleobases and amino 
acids. 

The segregation of the roles of different base po- 
sitions and in some cases different bases in terms 
of their biochemical correlates: The genetic code is 
like a "rule book" for steps in the biosynthesis of many 
amino acids, 57 but the chemical correlations which are 
its rules are of many kinds. Beyond the mere existence 
of those rules, and their collective role as indices of reg- 
ularity threading the code, we must explain why rules 
of different kinds are so neatly segregated over different 
base positions and sometimes over different bases (as in 
the XGX and XAX codons). 

A compression that references process rather 
than property: The role of biosynthetic pathways as 
correlates of regularities makes this compression of the 
genetic code a reference to the process and metabolic 
network context within which amino acids are produced, 
and not merely to their properties. (Many of the chemi- 
cal properties recognized as criteria of selection, whether 
size or hydrophobicity, are shared at least in part be- 
cause they result from shared substrates or biosynthetic 
steps.) We think of the function of coding as separat- 
ing biosynthetic process from phenotype: transcription 
and translation are "Markovian" in the sense that the 
only information from the biosynthetic process which sur- 
vives to affect the translated protein is what is inherent 
in the structure of the amino acid. 58 Thus selection on 
post-translation phenotypes should only be responsive to 
the finished amino acids. The existence of regularities in 
the genetic code which show additional correlation with 



The correlations in the code may be understood as rules be- 
cause the biosynthetic pathways may be placed on a decision tree, 
with branches labeling alternative reactions at several stages of 
synthesis, and branching directions indicated by the position- 
dependent codon bases |205l . 

In technical terms, one says that the phenotype is conditionally 
independent of the biosynthetic pathway, given the amino acid. 
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intermediate steps in the biosynthetic process therefore 
requires either causes other than selection on the post- 
coding phenotype (including its robustness) , or a history- 
dependence in the formation of the code that reflects ear- 
lier selection on intermediate pathway states. If they re- 
flect causal links to metabolic chemistry, these "failures" 
of the separation between biosynthetic constraint and se- 
lection of polymers for phenotype may have broken down 
the emergence of molecular replication into a sequence of 
simpler, more constrained, and therefore more attainable 
steps. 



VII. CONCLUSIONS 

We have argued that the fundamental problem of elec- 
tron transfer in aqueous solution leads to a qualitative 
division between catalytically "hard" and "easy" chem- 
istry, and that this division in one form or another has 
led to much of the architecture and long-term evolution of 
the biosphere. Hard chemistry involves electron transfers 
whose intermediate states would be unstable or energet- 
ically inaccessible in water if not mediated by transition- 
metal centers in metal-ligand complexes. Easy chemistry 
involves hydrogenations and hydrations, intramolecular 
redox reactions, and a wide array of acid-base chemistry. 
Easy chemistry is promiscuously re-used and provides 
the internal reactions within modules of core metabolism. 
Hard chemistry defines the module boundaries and the 
key constraints on evolutionary innovation. These sim- 
ple ideas underlie a modular decomposition of carbon 
fixation that accounts for all known diversity, largely in 
terms of unique adaptations to chemically simple vari- 
ations in the abiotic environment. On the foundation 
of core metabolism laid by carbon fixation, the remain- 
der of biosynthesis is arranged as a fan of increasingly 
independent anabolic pathways. The unifying role of 
the core permits diverse anabolic pathways to indepen- 
dently reverse and become catabolic, and the combina- 
torics of possible reversals in communities of organisms 
determines the space of evolutionary possibilities for het- 
erotrophic ecology. 

We have emphasized the role of feedback in biochem- 
istry, which takes different forms at several levels. Net- 
work autocatalysis, if we take as a separate question 
the origin of external catalytic and cofactor functions, 
is found property internal to the small-molecule 

substrate networks for many core pathways. A quali- 
tatively different form of feedback is achieved through 
cof actors, which may act either as molecular or as net- 
work catalysts. As network catalysts they differ from 
small metabolites because their internal structure is not 
changed except at one or two bonds, over the reactions 
they enable. The cofactors act as "keys" that incorpo- 
rate domains of organic chemistry within biochemistry, 
and this has made them both extraordinarily productive 
and severely limiting. No extant core pathways func- 
tion without cofactors, and cofactor diversification ap- 



pears to have been as fundamental as enzyme diversifica- 
tion in some deep evolutionary branches. We have there- 
fore argued for a closely linked co-evolution of cofactor 
functions with the expansion of the universal metabolic 
network from inorganic inputs, and attempted to place 
key cofactor groups within the dependency hierarchy of 
biosynthetic pathways, particularly in relation to the first 
ability to synthesize RNA. 

The most important message we hope to convey is the 
remarkable imprint left by very low-level chemical con- 
straints, even up to very high levels of biological orga- 
nization. Only seven carbon fixation modules, mostly 
determined by distinctive, metal-dependent carboxyla- 
tion reactions, cover all known phylogenetic diversity and 
provide the building blocks for both autotrophic and het- 
erotrophic metabolic innovation. A similar, small collec- 
tion of organic or organomctallic cofactor families have 
been the gateways that determine metabolic network 
structure from the earliest cells to the present. The num- 
ber of these cofactors that we consider distinct may be 
somewhat further reduced if we recognize biosynthetic 
relatedness that leads to functional relatedness (as in the 
purine-derived or chorismate-derived cofactors), or cases 
of evolutionary convergence dominated by properties of 
elements (as for lipoic acid and the CoB-CoM system). 

We believe that these regularities should be under- 
stood as laws of biological organization. In a proper, 
geochemically-embedded theory of the emergence of 
metabolism, they should be predictable, either particular 
forms as in the case of metal chemistry or convergent uses 
of nitrogen and sulfur, or properties of distributions as in 
the use of network modules or the diversity of cofactors. 
Moreover, this lawfulness should have been expected: the 
factors that reduce (or encrypt) the role of laws in biol- 
ogy, and lead to unpredictable historical contingencies, 
arise from long-range correlation. Correlation of multi- 
ple variables leads to large spaces of possibility and en- 
tangles the histories of different traits, making the space 
difficult to sample uniformly. But correlation in biology 
is in large part a constructed property; it has not been 
equally strong in all eras and its persistence depends on 
timescales. Long-term evolution permits recombination 
even in modern integrated cells and genomes. Early life, 
in contrast, with its less-integrated cells and genomes, 
and its more loosely-coupled traits, had constructed less 
long-range correlation. These are the domains where the 
simpler but invariant constraints of underlying chemistry 
and physics should show through. 
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Appendix A: Bipartite graph representations for 
chemical reaction networks 



The stoichiometry of a chemical reaction may be rep- 
resented by a directed hypergraph [258] , A hypergraph 
differs from a simple graph in that, where each edge of a 
simple graph has two points as its boundary, in a hyper- 
graph, a hyper-edge may have a set of points as its bound- 
ary. In a directed hypergraph, the input and output sets 
in the boundary are distinguished. For the application to 
chemistry presented here, each hyper-edge corresponds to 
a reaction, and its input and output boundary sets cor- 
respond to moles of the reactant and product molecules. 

It is possible to display the hypergraphs represent- 
ing chemical reactions as doubly-bipartite simple graphs, 
meaning that both nodes and edges exist in two types, 
and that well-formed graphs permit only certain kinds 
of connections of nodes to edges. The bipartite graph 
representation of a reaction has an intuitive similarity 
to the conventional chemical-reaction notation (shown 
in Fig. Al below), but it makes more explicit refer- 



ence to the chemical mass-action law as well as to the 
reaction stoichiometry. For appropriately constructed 
graphs, graph-rewrite rules correspond one-to-one with 
evaluation steps of mass-action kinetics, permitting sim- 
plification of complex reaction networks to isolate key 
features, while retaining correspondence of the visual and 
mathematical representations. 

We use graph representations of reaction networks in 
the text where we need to show relations among multiple 
pathways that may connect the same inputs and outputs 
(such as acetyl-CoA and succinyl-CoA), and may draw 
from the same input and output species (such as CO2, re- 
ductant, and water). Parallel input and output sequences 
appear as "ladder" topology in these graphs, and for the 
particular pathways of biological carbon fixation, this is 
due to the recurrence of identical functional-group re- 
action sequences in multiple pathways, as discussed in 
Sec. iHIBl 

In this appendix we define the graph representation 
used in the text, introduce graph-reduction procedures 
and prove that they satisfy the mathematical property 
of associativity, and provide solutions for the particular 
simplification of interacting rTCA and Wood-Ljungdahl 
pathways in a diluting environment. 



1. Definition of graphic elements 

a. Basic elements and well-formed graphs 

The elements in a bipartite-graph representation of a 
chemical reaction or reaction network are defined as fol- 
lows: 



• Filled dots dots represent concentrations of chemi- 
cal species. Each such dot is given a label indicating 
the species, such as 

ACE 

• o [ACE] , 
used to refer to acetate in the text. 

• Dashed lines represent transition states of reac- 
tions. Each is given a label indicating the reaction, 
as in b __.. 

• Hollow circles indicate inputs or outputs between 
molecular species and transition states, as in 



Ace 



COf 



Each circle is associated with the complex of reac- 
tants or products to the associated reaction, indi- 
cated as labeled line stubs. 

• Hollow circles are tied to molecular concentrations 
with solid lines ACE ; one line per mole of reactant 
or product participating in the reaction. (That is, 
if to moles of a species A enter a reaction b, then 
m lines connect the dot corresponding to [A] to the 
hollow circle leading into reaction b. This choice 
uses graph elements to carry information about sto- 
ichiometry, as an alternative to labeling input- or 
output-lines to indicate numbers of moles.) 

• Full reactions are defined when two hollow circles 
are connected by the appropriate transition state, 



as m 



ACE 

C02^ 



< 



PYR 



H20 



describing the reductive carboxylation of acetate to 
form pyruvate. 

• The bipartite graph for a fully specified reaction 
takes the form 



H20 




H2CO2 

ACE 



PYR 



(Al) 



where labeled stubs are connected to filled circles 
by mole-lines. The bipartite-graph corresponds to 
the standard chemical notation for the same reac- 
tion as shown. 
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b. Assignment of graph elements to terms in the 
mass-action rate equation 

The mass-action kinetics for a graph such as the reduc- 
tive carboxylation of acetate 59 is given in terms of two 
half-reaction currents, which we may denote with the re- 
action label and an arbitrary sign as 

j+ = k b [ACE] [C0 2 ] [H 2 ] 

i 6 " = k b [PYR] [H 2 0] . (A2) 

kb and kb denote the forward and reverse half-reaction 
rate constants. The total reaction current J b = jt —jZ is 
related to the contribution of this reaction to the changes 
in concentration as 

[ACE] = [C0 2 ] = [H 2 ] = -J b 

[PYR] = [H 2 0] = J b , (A3) 

where the overdot denotes the time derivative. Reac- 
tion currents on graphs do not have inherent directions, 
reflecting the microscopic reversibility of reactions. All 
sources of irreversibility are to be made explicit in the 
chemical potentials that constitute the boundary condi- 
tions for reactions. 

Each term in the mass-action rate equation may be 
identified with a specific graphical element in the bipar- 
tite representation. The half-reaction rate constants kb, 
kb are associated with the hollow circles, and the current 
Jb (which is the time-derivative of the coordinate giv- 
ing the "extent of the reaction") is associated with the 
transition-state dashed line. Concentrations, as noted, 
are associated with filled dots, and stoichiometric coeffi- 
cients are associated with the multiplicities of solid lines. 



2. Graph reduction for reaction networks in steady 

state 

Networks of chemical reactions in steady state satisfy 
the constraints that the input and output currents to 
each chemical species (including any external sources or 
sinks) sum to zero. These constraints are the basis of sto- 
ichiometric flux-balance analysis [259 262], but they can 
also be used to eliminate internal nodes as explicit vari- 
ables, leading to lumped-parameter expressions for entire 
sub-networks as "effective" vertices or reactions. With 



appropriate absorption of externally buffered reagents 
into rate constants, this network reduction can be done 
exactly, without loss of information. An example of such 
a reduction is the Michaelis-Menton representation of 
multiple substrate binding at enzymes. Systematic meth- 
ods for network reduction were one motivation behind 
Sinanoglu's graphic methods |263l 1264] . More sophisti- 
cated stochastic approaches have recently been used to 
include fluctuation properties in effective vertices, gener- 
alizing the Michaelis relation beyond mean field [265]. 

The map we have given of mass-action rate parameters 
to graphic elements allows us to represent steady-state 
network reduction in terms of graph reduction. In this 
approach, rewrite rules for the removal of graph elements 
are mapped to composition rules for half-reaction rate 
constants and stoichiometric coefficients. These compo- 
sition rules can be proved to be associative, leading to an 
algebra for graph reduction. Here we sketch the rewrite 
rules relevant to reduction of the citric-acid cycle graph. 
In the next subsection we will reduce the graph, to the 
form used in the text. 



a. The base composition rule for removal of a single 
internal species 

The simplest reduction is removal of an intermediate 
chemical species that is the sole output to one reaction, 
and the sole input to another, in a linear chain. Exam- 
ples in the TCA cycle include malate (MAL) and isoc- 
itrate (ISC), produced by reductions and consumed by 
dehydrations. They also include citrate (CIT) itself, pro- 
duced by the hydration of aconitate and consumed by 
retro-aldol cleavage. 

For a single linear reaction as shown in Fig. [18] the 
mass-action law is 

[A]k a -[B]k a = J a , (A4) 
and concentrations change as 

[A] = -J a 

[B] = J a . (A5) 
The equilibrium constant for the reaction A — > B is 

Ka^b = J". (A6) 



All examples in this appendix use the same simplified projec- 
tion onto the CHO sector that is used in diagrams in the main 
text. Actual reaction free energies will be driven by coupled en- 
ergies of hydrolysis of ATP or oxidation of thiols to thioesters. 
The graph-reduction methods described in the next section may 
be used to include such effects into lumped-parameter represen- 
tations of multi-reagent reaction sequences that regenerate en- 
ergetic intermediates such as ATP or CoA in a network where 
these are made explicit. 



[A] • d— -— 6 • [B] 

FIG. 18: Basic reaction graph. [A] and [B] are concentrations 
associated with the two colored nodes. Forward and backward 
rate constants k a and k a are associated with the two unfilled 
circles. The associated reaction state current is J a . 
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For two such reactions in a chain, as shown in Fig. 19 
the mass-action laws are 



[A]k a -[X]k a = J a 

[X]k b -[B]k b = J b , 
and the conservation laws become 



[A] 
[X] 
[B] 



-Ja 
Ja — Jb 
Jb- 



[A]«- 



-o- 



[X] 



_Q 0_ 



(A7) 



(A8) 



-mm 



-mm 



FIG. 19: Removal of an internal species X from a dia- 
gram with elementary reactions. Rate constant pairs (k a , k a ) , 

(kbjkb) are used to define new rate constants (k a b,k a b) for 
the effective transition state ab. 



[A] 



-0~ b ~0 • 0- c -0— • [B] 



[X] 



[Y] 



{ 



[A] •— 0----0 • o---c^-m [B] 



or 



[Y] 



[A] 



-O- 



-o-~-o—m [B] 



} 



[X] 



— ► [A] • O- -- - - - -O • [B] 

FIG. 20: Composition of three reactions a, b, c can proceed 
by elimination of either X or Y first. 



concentration variables added), so we demonstrate asso- 
ciativity for the base case as the foundation for other 
cases. 



From Eq. ( A12 1 for (k a ,k a ) ° (kb,kb), followed by the 



equivalent expressions for \k ab ,k a bj ° (k c ,k c j, (k a ,k a ) 
(kbc,kbc), and (kb,kb) (k c ,k c ), we derive the sequence 
of reductions 



Under the steady-state condition [X] = 0, we wish to 
replace the equations ( A7|A8 ) with a rate equation 



[A] kab - [B] k ab = J ab 



(A9) 



and a conservation law expresse d in terms of J a — J a b = 
J b - The rate constants in Eq. (A9) are to be specified 
through a composition rule 



(k a , ka) O (k b , k b ) = (k ab , Kb) 



(A10) 



derived from the graph rewrite. Removing [X] from the 
mass-action equations using [X] = 0, we derive that the 
rate constants satisfying Eq. ( A9 1 are given by 



k a b 
k a b 



k a kb 
ka "i" kb 

k a kb 
ka "i" kb 



(All) 



The associated equilibrium constant correctly satisfies 
the relation 



k a b 

k a b 



k a kb 
k a kb 



(A12) 



b. Associativity of the elementary composition rule 



The composition rule (A12) is associative, meaning 



that internal nodes may be removed from chains of reac- 
tions in any order, as shown in Fig. |20| All composition 
rules derived in the remainder of this appendix will be 
variants on the elementary rule (with additional buffered 



kabc 



k a bk c 
k a b 4~ k c 

k a kbk c 



k a kb + (ka 4" kb) k c 

k a kbk c 
k a (kb 4~ fee) H~ kbk c 
k a kbc 



(A13) 



k a ~\~ kbc 

and a similar equation follows for k a bc- Thus we have 

[(k a ,k a ) o (k b ,kb)]°(k c ,k c ) = (k a ,k a )o[(k b ,k b ) ° (k c ,k c )] 

(A14) 



c. Removal of internal nodes that require other inputs or 
outputs 

Next we consider the elimination of an internal node 
[X] that is produced or consumed together with other 
products or reactants. Conservation [X] = implies re- 
lations among the currents of these other species as well. 
All remaining graph reductions that we will perform for 
the TCA cycle are of this kind. In some cases both the 
secondary product and reactant are the solvent (water), 
as in the aconitase reactions (repeated in TCA, 3HB, 
4HB, and bicycle pathways). In other cases they are re- 
ductants or inputs such as CO2 that we consider buffered 
in the environment. 

The pair of mass action equations we wish to reduce 
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are 60 

[A] k a - [X] [C] k a = J a 
[X] [D] k b — [B] k b = J b , 

and the desired reduced form is 

[A] [D] k ab - [C] [B] k ab = J ab . 



(A15) 



(A16) 



We first reduce Eq. (A15) to the base case of the pre- 
vious section, by absorbing the concentrations not to be 
removed into a pair of effective rate constants 

[A]fc Q -[X]([C]fc„) = J a 
[X]([D]fc 6 )-[B]fc 6 = J b . (A17) 

From these we derive a composition equation 

[A] k ab - [B] k ab = J ab , (A18) 

corresponding to the graph representation in Fig. |21| We 

may then define k ab and k ab by the elementary composi- 
tion rule ( |A10[ ) 



(k a , [C] k a ) o ([D] k b , h) = [kab, k ab 
giving the transformation 61 

k a [D] k b 



(A19) 



k a b — 
k a b = 



[C] k a + [D] k b 
[C] k a h 

[C] fc Q + [D]fc b ' 



(A20) 



FIG. 21: Representation of a composite graph with internal 
connections other than those to X as an effective elementary 
graph. Highlights denote the absorption of other species into 
modifications of effective rate constants coupled to X at a 
and b. These are then used to define the elementary-form 
rate constants k ab and k ab in the reduced graph. 



Now removing the factors of [C] and [D] used to define 
the hatted rate constants, 



k ab = [D] k ab 

k a b = [C] k ab , 



(A21) 



we obtain a direct expression for the composition rule in 
Eq. |AT8|, of 



k a b 
k a b 



k n ,k b 



[C] k a + [D] k b 

k a kb 

[C] fc a +[D] k b 



(A22) 



which is the interpretation of the graph reduction shown 
in Fig. [22] 



[A] i 



[C] [D] 



-O • [B] 



[X] 
[D] 



[A] 



[D] [C] 



[B] 



[C] 



[D] 



[A] 



• O ' ! -.a- • W b O • 



[B] 



[X] 



[D] [C] 
;A|» -.a • [B] 



60 In this and the following examples, we consider single additional 
species [C] and [D]. These may readily be generalized to a vari- 
ety of cases in which the additional reagents are J^ —1 [Cfc] and 

Note that if [C] and [D] are the same spec ies these cancel in 
the numerator and denominator of Eq. ||A20| , and the same ap- 
plies to common factors in products J"mL^ [Cfc] and J^^j [D;]. 
Therefore these factors may simply be removed before the graph 
reduction if desired, because they encoded redundant constraints 
with the conservation law already implied by [X] = 0. The irrel- 
evance of redundant species in the graph reduction for removal 
of [X] is radically different from the graphically similar-looking 
role of a network catalyst which is both an input and an out- 
put of the same reaction. Network catalysts are essential to the 
determination of reaction rates. 



FIG. 22: The composite graph corresponding to the reduc- 
tion from Eq. \A15\ to Eq. ( |A16[ ). 



d. Associativity for composite 



Associativity for composite graphs follows from the 
associativity of the elementary composition rule (A14|, 



via the grouping (A19). To show how this works, we 
demonstrate associativity for the minimal case shown 
in Fig. [23] The important features are that the graph 
"re-wiring" follows from composition of the rule demon- 
strated in Fig. [22] and the composition rule for rate con- 
stants permits consistent removal of the necessary factors 
of reagent concentrations. 

The application of the elementary reduction to remove 
X, corresponding to the second line in Fig. [23] yields 
Eq's. ( |Al9|A20p . An equivalent removal of Y first (the 
third line of Fig. 23 1 gives 



(k b ,[E] k b ) o ([F] k c ,k c 



k bc , k bc I , 



(A23) 
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[A] 



[C] [D] [E] [F] 

[X] [Y] 



[D] [C] [£] [F] 



-O— »[B] 



{[Ai« c • — 1---0— mm * 
or " ' m ) 

[C] [F] [E] f 



[A] 



[X] 

[D] [F] [C] [E] 



[A] 



[B] 



[B] 



FIG. 23: A two-step reduction with other internal connec- 
tions, which may be performed by removing either X or Y 
first. 



using the appropriate version of the graph-dependent 
evaluation rule (A22| in each step. The resulting com- 



position (A28l is automatically associative, because it 



satisfies the conversion 



Kbc = [D] [F] k abc 
kabc = [C] [E] k abc 



(A29) 



with Eq. ( A26 1, which is associative. As a final check, the 



equilibrium constants in the normal reaction form satisfy 
the necessary chain rule 



kabc k a kfr k c 
kabc k a h k c 



(A30) 



Intermediate (hatted) rate constants have been used here 
to show how associativity is inherited from the base case. 
The examples below work directly with the actual (un- 
hatted) rate constants, which keep the network in its lit- 
eral form at each reduction. 



with rule 



he = 

kbc — 



h [F] k c 
[E] k b + [F] k c 

[E] hk c 
[E] k b + [F] k c 



(A24) 



The two equivalent rules for removing whichever internal 
node was not removed in the first reduction are 



kab, [E] k ab j O ([F] k c , fc c ) = [kabc, kabc j 
(k a , [C] k a ) ° f [D] k b c, fcfoc) = (kabc, k a bcj ■ (A25) 

Composing these rules for intermediate rate constants, 
we may check that 

~ _ k ab [F] k c 

[E]k ab + [F]k c 

(k a [D] k b ) [F] k c 



[C] h [E] k b + ([C] k a + [D] h) [F] k c 

kg [D] (h [F] k c ) 

[C] k a ([E] k b + [F] k c ) + [D] k b [F] k c 

kg [D] k b C 

[C] fc a +[D] kbc 



(A26) 



and a similar equation follows for k a bc- Converting the 
hatted forms to the normal reaction form produces the 
rate equation 



[A] [D] [F] k abc - [C] [E] [B] hbc = Jg. 



be- 



(A27) 



We may directly obtain the rate constants k a b c , k a b c 
with the composition rule 

{kgbc,k a bc) = (k a b, k a b) ° (k c ,k c ) 

= (h-.h) ° (he, he) , (A28) 



3. Application to the citric-acid cycle reactions 

Using this graph representation and the associated 
graph reductions, we may express the qualitative kinetics 
associated with network autocatalysis in the rTCA cycle. 
We use a minimal model network in which only the cy- 
cle intermediates are represented explicitly, and only the 
CHO stoichiometry is retained. 62 External sources or 
sinks are used to buffer only four compounds in the net- 
work, which are CO2, H 2 , H 2 0, and a pool of reduced 
carbon which we take to be acetate (ACE, or CH3COOH) 
because it has the lowest free energy of formation of cy- 
cle intermediates under reducing conditions (following 
Ref. [266] ) and is the natural drain compound [7]- 

The purpose of network reduction in such a model is 
to produce a graph in which each element corresponds to 
a specific control parameter for the interaction of conser- 
vation laws with non-equilibrium boundary conditions. 
CO2, H 2 , and H 2 provide sources of carbon and rc- 
ductant, and an output for reduced oxygen atoms. Be- 
cause they comprise different ratios of three elements, 
any set of concentrations is consistent with a Gibbs equi- 
librium, and the chemical potentials corresponding to the 
elements are preserved by the conservation laws of arbi- 
trary reactions. A fourth boundary condition for acetate 
cannot be linearly independent in equilibrium, and drives 
the steady-state reaction flux. 

Such a model is limited in many ways. The replace- 
ment of explicit (and unknown) parasitic side reactions, 
from all cycle intermediates, by a single loss rate for ac- 
etate may fail to capture concentration-dependent losses, 



62 As noted above, phosphorylated intermediates and thioesters, 
including the energetically important substrate-level phosphory- 
lation of citrate and succinate, are not represented. 
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in a way that cannot simply be absorbed into lumped rate 
constants. Moreover, the rate constants themselves de- 
pend on catalysts, and reasonable values for these in a 
prebiotic or early-cellular context are unknown. There- 
fore all critical properties of the model are expressed 
relative to these rate constants. The reduction remains 
meaningful, however, because the lumped-parameter rate 
constants are controlled by the three buffered environ- 
mental compounds CO2, H2, and H2O, leaving the net- 
work flux to be controlled by the disequilibrium concen- 
tration of acetate. 



stants are given by' 



(>:S 



kka — 
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kde 


kd kg 




kd ~\~ k e 


ki kj 


kij - 


k% kj 


k<i ~\~ kj 


ki 4~ kj 


kk k a 


kka 


kkk a 


kk + k a 


kk + k a 



(A31) 



with equivalent expressions for the fcs. These define the 
elementary reactions in the reduced graph of Fig. [25] 



The graph reduction sequence 






FUM W f 



FIG. 25: Graph of Fig. [24] with its highlighted species re- 
moved. Cis-aconitate (cAC highlighted) has common factors 
of [H2O], and is the next internal node to be removed, by the 
rewrite rules of Sec. |A 2 c| but with the simplifying feature 
that common factors cancel, so they resemble the base case. 



FIG. 24: The projection of the TCA cycle onto CHO com- 
pounds. Phosphates and thioesters are omitted, and the 
stoichiometry of all acids refers to the protonated forms, so 
that H2 stands for general two-electron reductants. Omission 
of explicit representations of substrate-level phosphorylation 
to form citryl-CoA and succinyl-CoA causes water elimina- 
tion to accompany carboxylation of acetate and succinate in 
this graph, where in the actual cycle it would occur outside 
the graph, in the formation of pyrophosphates. Highlighted 
species are sole outputs and sole inputs of their associated 
reactions, and can be removed with the elementary composi- 
tion rule (All I of Sec. A2a Legend: acetate (ACE), pyru- 
vate (PYR), oxaloacetate (OXA), malate (MAL), fumarate 
(FUM), succinate (SUC), Q-ketoglutarate (AKG), oxalosuc- 
cinate (OXS), cis-aconitate (cAC), isocitrate (ISC), citrate 
(CIT). 



The bipartite graph for the minimal rTCA network 
in CHO compounds is shown in Fig. [24] All networks 
in the text are generated by equivalent methods. High- 
lighted nodes are those that can be removed by the base 
reduction in Sec. lA2al Reactions are labeled with lower- 
case Roman letters, and relative to the elementary half- 
reaction rate constants, the lumped-parameter rate con- 



One further reduction that follows the elementary rule 



in Fig. 25 is removal of cis-aconitate (cAC), which in- 
volves a common factor of the solvent [H2O]. The result- 
ing lumped-parameter rate constants are given by 



kijka 



kij kka 
"ij 4~ kk a 



kijka — 



kij kka 
kij -\~ kka 



(A32) 



These lead to the graph of Fig. [26] 

All further graph reductions require the composition 
rules of Sec. A 2 c[ and result in changes of the input 
or output stoichiometrics of the unreduced nodes. All 
highlighted compounds in Fig. [26] may be removed, and 
the resulting lumped-parameter rate constants are given 
by 



kh kf 



[H 2 0] k b + [C0 2 ] k c 



63 Here and below, we give formulae only for the forward half- 
reaction rate constants k. Formulae for the backward half- 
reaction rate constants k have corresponding forms as shown in 
the preceding sections. 



47 




FIG. 27: Graph of Fig. [26] with all internal nodes from lin- 
ear chains removed. [H2O], [H2], [CO2], and [ACE] are the 
four molecular concentrations to which boundary sources are 
coupled. [OXA] is retained as the last representation of the 
network catalysis of the loop, indicated by highlighting of the 
reaction in which OXA is input and output with equal stoi- 
chiometry. In steady state, OXA is in equilibrium with ACE, 
because it is not coupled to external currents. 




FIG. 26: Graph of Fig. [25] with cAC and its parallel links 
to water removed. For all remaining species except acetate 
(ACE), neither sources nor sinks are assumed, and these may 
be removed with non-trivial instances of the composition rule 
of Sec. |A 2 c| Each of these removals changes the degree of 
the remaining reactions, and thus changes the topology of the 
graph. 



The lumped-parameter rate equations for Fig. [27] 
parametrized by lumped-parameter rate constants, are 

J bc = [ACE] [H 2 ] [C0 2 ] 2 fc fcc 

- [OXA] [H 2 0] he 

Jdefghijka = [OXA] [H 2 ] [C0 2 ] kdefghijka 

- [OXA] [ACE] [H 2 0] 2 fc rfe/gW , fe( (A34) 

In steady state Jbc = and [OXA] may be replaced with 
the equilibrium function 



kdef = 



Kdefg 
kdefgh 
kdef ghijka 



[H 2 0] k de + [H 2 ] k f 

kdef kg 

[H 2 0] k def + [H 2 ] [C0 2 ] k g 

kdefgkh 

[H 2 0] 2 fc de/9 + [C0 2 ] k h 

kdefgh^ijka 

[H 2 0} 2 k de f gh + [H 2 ] kijka 



(A33) 



These define the maximal reduction of the original rTCA 
graph, to the graph shown in Fig. [27] 




[OXA] 



he [H 2 ] [CQ 2 
he P 2 0] 



[ACE] 



(A35) 



b. Network reaction fluxes and their control parameters 

For the remainder of the appendix we replace the sub- 
script defgMjka with designation r TCA in currents J, half- 
reaction rate constants fc, k, and equilibrium constants 
K. Dimensionally, the rate constants require the concen- 
tration of OXA in the mass-action law, and so presume 
that the anaplerotic segment & c has been handled. 

Pl uggin g Eq. (A35) into the second rate equation of 
Eq. (A34), and supposing [OXA] is in equilibrium with 



[ACE] at a (non-equilibrium) steady state for the network 
as a whole, we obtain the only independent mass-action 
rate equation for the reduced network. This is the current 
producing acetate: 



J rTCA = [H 2 ] [C0 2 ] 2 [H 2 0] [ACE] f £™* [H 2 ] 4 [C0 2 ] 2 _ \ 

he VfcrTCA [H 2 0] / 



The first term in parenthesis in Eq. (A36l is the concen- inorganic inputs, which we denote 64 



tration at which acetate would be in equilibrium with the 



[ACE^^HW (A37, 

KrTCA [H 2 OJ 
Although the lumped-parameter rate constant in this relation 
appears complex, the consistency conditions with single-reaction 
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Therefore the network response is proportional to the 
offset of [ACE] from its equilibrium value, with a rate 
constant that depends on the particular contributions of 
chemical potential from [CO2] and reductant. 

4. Interaction of Wood-Ljungdahl with rTCA 

We may envision an early Wood-Ljungdahl "feeder" 
pathway to acetyl-CoA as a reaction with the same stoi- 
chiometry as rTCA for the creation of acetate, but fixed 
half-reaction rate constants that do not depend on the 
internal concentrations in the network. This may be a 
pre-pterin mineral pathway [56], in which rate constants 
are determined by the abiotic environment, or an early 
pathway using pterin-like cofactors, if the concentrations 
of these are somehow buffered from the instantaneous 
flows through the reductive pathway. Labeling this "lin- 
ear" effective reaction WL, the rate equation becomes 

Jwl = W L [H 2 Of ( ^i 1 ™/ ~ [ACE]) . 

\kwl [H 2 OJ J 

(A38) 

equilibrium constants ensure that fcrTCA/^rTCA is independent 
of synthetic pathway and equal to the exponential of the Gibbs 



Note that ^wl/Wl = fcrTCA/^rTCA because both are 
expressions for the equilibrium constant which depends 
only on the free energy of reaction. 

To understand the performance of a joint network in 
the presence of losses, as the simplest case introduce a 
reaction labeled Env standing for dilution of acetate to an 
environment at zero concentration. The dilution current 
becomes 



J E nv = fcEnv [ACE] . (A39) 

At a non-equilibrium steady state the total losses must 
equal the total supply currents, so that 

^Env = JtTCA + Jwl- (A40) 



The un-reduced equation for steady-state currents can 
be written 

free energy of formation. 



■/rTCA + JwL 

= Jd 



= [H 2 0] 2 { V fcrTCAfcrTCA 

I A,TCA [H 2 J 



k D [ACE] 



[ACE]J/ 2 [ACE] 



<-WL 



( [ACE] G - [ACE]) 



(A41) 



The graph corresponding to this model for rate laws is 
shown in Fig. [28] 



L rTCA 




FIG. 28: Hypergraph model for parallel reactions through 
the rTCA and WL pathways, coupled to a linear drain re- 
action representing dilution of acetate by the environment. 



Gibbs equilibrium with the inputs: 

[ACE] 



[ACE] ( 



(A42) 



For a network with no reaction barriers (either in rate 
constants or due to limitations of network catalysts, the 
output x —} 1. 

The two control parameters that govern the relative 
contributions of the rTCA loop and the direct WL feeder 
are 



ZrTCA 



rTCA^rTCA 



v Env 



TCA 



[Ho 



fc WL [H 2 Of 



(A43) 



The variable that characterizes the "impedance" of a 
chemical reaction network, and displays thresholds for 
autocatalysis when these exist, is the ratio of the output 
acetate concentration to the value that would exist in a 



Each control parameter is a ratio of lumped half-reaction 
rates that feed [ACE] to the environment dilution con- 
stant A:Env through which it drains. 

In terms of z\vl ano - ^rTCAi the normalized concentra- 
tion x - which is proportional by ^Env to the total current 
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through the system - satisfies 




The solution to Eq. ( A44) is shown versus base-10 log- 
arithms of z r TCA and z-yvL in Fig. [12] in the main text. 



The critical (unsupported) response of the rTCA loop oc- 
curs at zwl — > and z r TCA = 1- It is identified with the 
discontinuity in the derivative dx/dz T TCA at z r TCA = 1 
and the exactly zero value of x for z r TCA < 1- As zwl 
increases from zero, the transition becomes smooth, and 
a nonzero concentration x is maintained against dilution 
at all values of z r TCA- 
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